Vector Precoding for Gaussian MIMO Broadcast Channels: 
X> Impact of Replica Symmetry Breaking 

in 

Benjamin M. Zaidel Ralf R. Muller Aris L. Moustakas 
^""j " 1 Rodrigo de Miguel 1 

February 28, 2011 

m 
> 
o 

On 

o 

CO 

o 

O 

> 
•l-H 

X 



x This work was supported in part by the Research Council of Norway under Grant No. 171133/V30, 
and by the European Commission under Grant "Newcom++" No. EU-IST-NoE-FP6-2007-216715. 

This work was presented in part at the 46 th Annual Allerton Conference on Communication, Control & 
Computing, Monticello, IL, U.S. A, Sep. 2008 and the 10 th International Symposium on Spread Spectrum 
Techniques and Applications (ISSSTA), Bologna, Italy, Aug. 2008. 

Benjamin M. Zaidel, Ralf R. Muller and Rodrigo de Miguel are with the Department of Electronics and 
Telecommunications, The Norwegian University of Science and Technology (NTNU), Trondheim, Norway, 
e-mail: zaidel@iet.ntnu.no, rodrigo@iet.ntnu.no, ralf@iet.ntnu.no. Aris L. Moustakas is with the Physics 
Department, National and Kapodistrian University of Athens, Athens, Greece, e-mail: arislm@phys.uoa.gr 



Abstract 



The so-called "replica method" of statistical physics is employed for the large system analysis of 
vector precoding for the Gaussian multiple-input multiple-output (MIMO) broadcast channel. The 
transmitter is assumed to comprise a linear front-end combined with nonlinear precoding, that 
minimizes the front-end imposed transmit energy penalty. Focusing on discrete complex input 
alphabets, the energy penalty is minimized by relaxing the input alphabet to a larger alphabet 
set prior to precoding. For the common discrete lattice-based relaxation, the problem is found to 
violate the assumption of replica symmetry and a replica symmetry breaking ansatz is taken. The 
limiting empirical distribution of the precoder's output, as well as the limiting energy penalty, 
are derived for one-step replica symmetry breaking. For convex relaxations, replica symmetry is 
found to hold and corresponding results are obtained for comparison. Particularizing to a "zero- 
forcing" (ZF) linear front-end, and non-cooperative users, a decoupling result is derived according 
to which the channel observed by each of the individual receivers can be effectively characterized 
by the Markov chain u-x-y, where u, x, and y are the channel input, the equivalent precoder 
output, and the channel output, respectively. For discrete lattice-based alphabet relaxation, the 
impact of replica symmetry breaking is demonstrated for the energy penalty at the transmitter. 
An analysis of spectral efficiency is provided to compare discrete lattice-based relaxations against 
convex relaxations, as well as linear ZF and Tomlinson-Harashima precoding (THP). Focusing on 
quaternary phase shift-keying (QPSK), significant performance gains of both lattice and convex 
relaxations are revealed compared to linear ZF precoding, for medium to high signal-to-noise ratios 
(SNRs). THP is shown to be outperformed as well. In addition, comparing certain lattice-based 
relaxations for QPSK against a convex counterpart, the latter is found to be superior for low and 
high SNRs but slightly inferior for medium SNRs in terms of spectral efficiency. 



1 Introduction 



The multiple-input multiple-output (MIMO) Gaussian broadcast channel (GBC) is the focus of 
many research activities, addressing the growing demand for higher throughput wireless systems, 
and in particular the increasing use of multiple-antenna systems in essentially all modern wireless 
standards (see, e.g., [lj-[3] ) . The capacity region of the MIMO GBC is the dirty paper coding (DPC) 
[4] capacity region [5] , and several attempts have been made in recent years to propose practically 
oriented approaches for implementing DPC, as e.g., [6j|8j. DPC still remains, however, a difficult, 
computationally demanding task, which motivates the search for more practical (suboptimum) 
precoding alternatives. 

Since linear precoding, such as zero-forcing (ZF), leads to reduced performance (especially when 
the channel is ill-conditioned), much attention has been given to nonlinear precoding schemes. In 
particular, lattice-based precoding approaches have often been investigated, as for example the 
vector perturbation approach suggested in [9] (see also [10] for a general framework) . The vector 
perturbation approach was inspired by the idea of Tomlinson-Harashima precoding (THP 



11 



12 . In this scheme, a scaled complex integer vector is added to each data vector, chosen to 
minimize the energy penalty imposed by a linear zero-forcing (ZF) front-end. A modulo function 
is employed at the receivers, uniquely determining the transmitted symbols in the absence of noise. 
An analogous precoding scheme based on a linear minimum-mcan-squarcd-error (MMSE) front- 



end was considered in 13 . An approach based on optimizing mutual information was taken in 



14 . Vector perturbation is however still complex as it involves the solution of an NP-hard integer- 



lattice least squares problem (commonly implemented using the sphere-decoding algorithm [15]). 
Addressing the complexity aspect of the method, related approaches can also be found, e.g., in 
see references therein for additional literature in this framework), where lattice-basis 



reduction techniques are employed. 

The analytical performance analysis of such nonlinear precoding schemes is not at all trivial. It 
is common to consider, therefore, uncoded symbol error probabilities (via simulations), asymptotic 
capacity scaling laws and diversity orders (the asymptotic slope of the error probability in the high 
signal-to- noise ratio (SNR) regime), or to employ Monte-Carlo simulations to obtain information- 
theoretically achievable rates (see e.g., [^ |T3|[l6}|T9] , and also [20j for a semi-tutorial review in 
this respect). The energy penalty induced by the linear front-end is another commonly addressed 
performance measure. A lower bound on the energy penalty based on lattice theoretic arguments 
can be found in [21] . The optimum constellation shaping for a ZF front-end (in terms of the energy 



penalty), allowing for data to be independently decoded by the users, is investigated in 22 , where 
a selective mapping technique is introduced based on random coding arguments, implementable 



using nested lattice coding in a trellis precoding framework (see also 23 for a more recent study 
on selective mapping). 

The energy penalty minimization was also investigated in [24] where another nonlinear pre- 
coding approach in this framework was recently proposed. The transmitter comprises a linear 
front-end combined with nonlinear precoding. The nonlinear part relies on relaxation of the trans- 
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mitted symbols' alphabets to larger alphabet sets. The idea is to optimize the vector of transmitted 
symbols over the extended alphabet sets, so as to minimize the energy penalty imposed by the 
linear front-end, which is essentially the idea behind vector perturbation. However, a notable 
feature of this precoding scheme is that it can also be combined with convex extended alphabet 
sets (in contrast to [9]), lending themselves to efficient practical energy minimization algorithms. 
It can be considered in this sense as a generalization of the vector perturbation scheme (see also 



25 in this respect). 

Another interesting contribution of 124] is the harnessing of statistical physics tools for the 
analysis of the nonlinear precoding scheme, while considering the large system limit in which 
both the number of users and the number of transmit antennas grow large, while their ratio 
goes to some finite constant. One of the main objectives of statistical physics is the quantitative 
description of macroscopic properties of many-body systems while starting from the fundamental 
interactions between microscopic elements. In this framework, a general tool for the analysis of 
random ("disordered") systems, referred to as the "replica method", was originally invented for 
the analysis of spin glasses. The latter term describes a spin orientation that has similarity to the 



type of location of atoms in glasses, which are random in space but frozen in time 26 . However, 
the replica method turns out to have a much wider range of applications (see, e.g., 26p7 for recent 
tutorial manuscripts). In recent years, in particular following Tanaka's pioneering work 28 , the 
method has been successfully applied to various problems in wireless communications. The replica 
method has also been recognized by now as an important tool for information-theoretic analyses 
in cases where "conventional" random matrix theory does not apply. Although the replica method 
is heuristic in nature, extensive simulations and exact analytical results in the literature suggest 
that the replica analysis generally yields excellent approximations in many cases of interest (see 



again 26 27 , and also, e.g., 28-31 and references therein). 

The replica analysis usually employs a number of underlying assumptions regarding the be- 
havior of the quantities in concern in the large-system limit. One such fundamental assumption 
is the "self-averaging" property, which relies on the expectation that macroscopic properties of 
large random systems converge to deterministic values as the system dimensions grow large. Self- 
averaging is a property of most physical systems with large (or infinite) degrees of freedom, and is 
the result of the high probability of occurrence of typical events or samples. Nevertheless, in the 
case of glassy systems this property has been not trivial to prove, since the underlying randomness 
of the interactions makes the system inherently non-ergodic. The self-averaging property for the 



conventional so-called Sherrington-Kirkpatrick (SK) spin glass model 32 was first proven in 33 



More recently, 34 35 generalized it to a more general class of spin glass models and, using an 
ingenious method, showed that the averages over the disorder actually do converge in the large 
system limit. Even more recently, code-division multiple-access (CDMA) systems were shown to 



be self-averaging 36 . Although the particular system we study is not explicitly covered by the 
above analysis, it can readily be proved to be self-averaging using the same method. For the sake 
of space, we will not cover the proof here, however, we will refer to self- averaging as a fundamental 
property of the large system limit rather than an assumption. Another common assumption in 



2 



replica analyses is that of replica symmetry (RS) (see, e.g., 24 28 30] ), according to which it 
is assumed that the crosscorrelations between replicated microscopic system configurations are 
independent of the replica indices. The RS assumption, however, is known to produce incorrect 
conclusions for certain physical quantities such as, e.g., the minimum energy configuration. This 



led to the development of the replica symmetry breaking (RSB) theory 26 37 . Recently, the full 
RSB solution of the SK spin glass model, first proposed in [38] , was shown to be an upper bound to 



the minimum energy configuration 39 and later to be the exact solution of the model 37 . Apart 
from its general seminal importance, it is profoundly relevant in the context of vector precoding 



because the SK-model is a particular case of the more general models discussed in 24 and in the 
sequel. 

In this paper we consider a communication system setting in which RSB indeed occurs, and 
demonstrate the significant impact of the RSB treatment on the validity of the approximations 
produced by the replica analysis. We focus here on a wireless MIMO broadcast channel (BC) 
setting, where the transmitter has N transmit antennas and the K users have single receive 
antennas. Full channel state information (CSI) is assumed available at the transmitter, while the 
receivers are cognizant of their own channels only (more on this later) . No user-cooperation of any 
kind is assumed. The received signals are embedded in additive white Gaussian noise (AWGN). 



The precoding approach considered in 24 is revisited. Note that the focus in [24] is mainly on 
presenting the method, and on the derivation of the energy penalty in the asymptotic regime, in 
which both the number of transmit antennas N and the number of users K go to infinity, while 
K/N — > a < oo (commonly referred to as the system load). Furthermore, the analysis in 24 is 



based on the RS assumption. It turns out however that the RS assumption can only produce valid 
asymptotic approximations in this setting when the extended alphabets are convex sets (see, e.g., 
supportive simulation results in [25]). In contrast, for the non-convex alphabets considered in 24 



these approximations turn out to be rather loose, and produce overoptimistic results, especially 
as the system load gets close to unity. This behavior can be readily observed by comparing the 
RS based energy penalty to the asymptotic lower bound of [2l] . 

Here, an alternative analysis is provided based on what is referred to in the statistical physics 
literature as the one-step RSB (1RSB) ansatz, which allows one to search for more general so- 
lutions than the RS ansatz, but does not cover the full complexity of solutions of full RSB. In 



addition to an energy penalty analysis, analogous to the one in 24 , we complement the results 
by providing an information-theoretic perspective of the proposed precoding approach. Coded 
transmissions and achievable throughputs are considered. The employed performance measure is 
the normalized spectral efficiency, defined as the total number of bits/sec/Hz per transmit antenna 
that can be transmitted arbitrarily reliably through the broadcast channel. The limiting marginal 
conditional distribution of the nonlinear precoder's output which is required for the calculation 
of spectral efficiency, as well as the limiting energy penalty, are analytically formulated. Focusing 
on a ZF front-end, the spectral efficiency is expressed via the input-output mutual information 
of the equivalent single-user channel observed by each of the receivers. The analysis is applied 
next to a particular family of discrete extended alphabet sets (following [24]), focusing on a QPSK 
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input, demonstrating the RSB phenomenon. To complete the analysis we repeat the derivations 
while employing the RS ansatz, which is, as said, adequate for convex relaxation schemes, and 



the results are then applied to a convex alphabet example 24 . For both extended alphabets, 
numerical spectral efficiency results indicate significant performance enhancement over linear ZF 
preprocessing for medium to high SNRs. Furthermore, performance enhancement is also revealed 
compared to a generalized THP approach (which is a popular practical nonlinear precoding alter- 
native for such settings). Comparison of the two types of extended alphabet examples leads to 
interesting conclusions regarding the performance vs. complexity tradeoff of precoding schemes of 
the kind considered here. 

The remainder of this paper is organized as follows. Section [2] describes the system model. 
Section[3]provides an outline of the replica analysis and includes some general results. In particular, 
it clarifies the concept of RSB which later results are based upon. In order to analyze the mutual 
information and later the trade-off between spectral and power efficiency of various precoding 
schemes, we need to characterize the limiting conditional distribution of the precoder output. 
This task is solved in Section [4] providing a set of nonlinear equations whose solutions characterize 
the desired distributions. Section [5] particularizes to the ZF front-end and shows that the channel 
model can be represented as an equivalent concatenated single-user channel. Then, it derives the 
spectral efficiency of this equivalent concatenated channel. Section [6] particularizes the results of 
the previous sections to a discrete lattice-based alphabet relaxation of QPSK. Numerical solutions 
of the analytical results are provided. Those based on RSB are shown to match simulation results 
while those based on RS are demonstrated to fail. Section [7] is the corresponding counterpart 
to Section [6] for convex relaxation. Unlike Section [6j it finds the RS ansatz to provide accurate 
approximations. Section [8] presents a comparative analysis of the spectral efficiency of the two 
alphabet relaxation schemes against some other precoding approaches. Finally, Section [9] ends 
this paper with some concluding remarks. 

2 System Model 

Consider the following Gaussian MIMO broadcast channel 

r = Ht + n (2-1) 

where rtKxi] 1S the vector of received signals, H\kxN] is the (random) complex channel transfer 
matrix, assumed to be of unit expected row norm, i [jvx l] is the vector of transmitted signals, and 
n [Kxi] is the vector of i.i.d. zero mean proper complex AWGNs at the users' receivers. We denote 
the noises' spectral levels by a 2 so that n ~ J\f c (0,a 2 I). 

The precoding process at the transmitter is depicted in Figure [T] It is assumed that the users' 
messages are independently encoded, and that the encoders produce coded symbols {uk}^ =1 taken 
from some discrete alphabet . These symbols are treated as random variables, independent 
across users, and subject to the identical underlying discrete probability Pjj(u), u £ *% . We use 
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Figure 1: Block diagram of the vector precoding scheme. 



henceforth for convenience (as shall be made clear in Section [5]), the following probability density 
function (pdf) formulation 



dF v (u) = fu(u) du 



fie* 



Pu(u)5(u — u) du 



(2-2) 



Let it[if X i] denote the vector of the encoders' outputs, i.e., u = [m, . . . ,uk] t S % . The vector 
u is the input to a nonlinear precoding block that minimizes the energy penalty of the precoder 
through input alphabet relaxation (see below), and outputs a K x 1 vector x. The vector x is then 
taken as input to the linear front-end block where it is multiplied by the linear front-end matrix 
T[nxK], which is, in general, a function of the channel transfer matrix H (note that x depends 
on T and, hence, we can use the functional notation x(T, u)). The result is then normalized so 
that the actually transmitted vector t satisfies an instantaneous total power (energy per symbol) 
constraint P to t, i-e., 



t = VPtc 



Tx(T,u) 
\\Tx(T,u) 



<f tot (T,;c) 



Tx(T,u) 



(2-3) 



where $ tot (T , x) denotes the energy penalty induced by the precoding matrix T, and the particular 
choice of a;, as well as the average symbol energy of the underlying alphabet (the explicit 
dependence on the arguments is omitted henceforth for simplicity) . Denoting by P the individual 
power constraint per user (taken as equal for all), so that P tot = KP 1 we define the transmit SNR 
as 

R_ P 

(2-4) 



P 

72 



Ka 2 a* 

The energy penalty minimization is performed in the following way. The original alphabet ^ is 
extended ("relaxed") to an alphabet 33 = Uug^ wnere the sets {3§ u } are disjoint. The idea 
here is that every coded symbol u S *% can be represented without ambiguity using any element 
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of gg u (24j^ The vector x = [x\, . . . , xk] t thus satisfies 



x = argmin ||T'5|| 2 ■ (2-5) 



We note at this point that as an alternative to the normalization taken in ( 2-3 ) , ensuring 
an instantaneous transmit power constraint, a weaker average transmit power constraint can be 
applied, by simply replacing (? tot with i?{c? tot }, where E {■} denotes expectation. However, since 
we later concentrate on the energy penalty per symbol, 

etot 

, (2-6) 

and in view of the self-averaging property of the large system limit (as shall be made clear in 
the following), the two types of energy constraints yield the same asymptotic results. We thus 
focus for convenience throughout this paper on the instantaneous power constraint (as implied 



by (2-3)). Note also that in order to differentiate between the energy penalty induced by the 
precoding scheme, and the effect of the underlying symbol energy of the input alphabet °^ , one 
can alternatively represent the results in terms of what we refer to here as the precoding efficiency, 
defined through 

> (2-7) 



where cr„ = E{\u\ } (with the expectation taken with respect to (2-2 1) 



3 Outline of the Replica Analysis 

In the following we describe the main ideas behind the replica analysis of the problem in hand, 
and provide a heuristic outline of the approach taken to derive the main results of this paper. The 



reader is referred to tutorial manuscripts such as 26 27 31 for an elaborated background on the 
replica analysis. The fully detailed proofs are deferred to the appendices. 

We start here by focusing on the energy penalty, and note that the task of the nonlinear 
precoding block at the transmitter (see Figure]!]) can be described as follows. Its task is equivalent 
to the minimization of an objective function (called the Hamiltonian in physics literature) having 
the quadratic form 

U{x) = xUx , (3-1) 

with (-)t denoting transpose conjugation and J being a random matrix of dimensions K x K. 
Thus, the minimum energy penalty per symbol can be expressed as 

4 min H(x) , (3-2) 



1 For practical purposes one would also like to impose additional properties such as a certain minimum distance, 
alt houg h, in principle, the underlying necessary condition is to avoid ambiguity. Note also that the normalization 
in (|2-3[l makes the system insensitive to any scaling of the underlying alphabet . 
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where we use the shortened notation £$ u = 38 Ul x • • • x f,,. . Note also that to comply with ( 2-5 1 
one should take J = T^T, however since the results derived in the sequel hold, at least in part, 



for a more general class of matrices, we retain the formulation as in (3-1 ). 

To calculate the minimum of the objective function as defined in (3-1), it is convenient to 
introduce some notions from statistical physics (see, e.g., [3l]). In particular, we define a discrete 
probability distribution on the set of state vectors {x}, namely the Boltzmann distribution, as 

P B {x) = le-W") (3-3) 

where the parameter /? > is referred to as the inverse temperature j3 = l/T, while the normal- 
ization factor Z is the so-called partition function, which is defined as 

2 = E e ~ m(x) • ( 3 - 4 ) 

The energy of the system is given by 

£ = E p bWH(x) , (3-5) 

and the entropy (disorder) is defined as 

S = - E Pb(x) log P B (x) . (3-6) 

The definitions above hold for both discrete and continuous alphabets 38 u . The only difference is 
that for continuous alphabets the sums over x £ 33 u are replaced by integrals. 

At thermal equilibrium, the energy of the system is preserved, while the second law of ther- 
modynamics states that the entropy is the maximum possible. This is equivalent to minimizing 
the free energy of the system 

, (3-7) 



where /3, the inverse temperature, is in fact the Lagrange multiplier in the maximization of (3-6), 
subject to the mean energy constraint. At equilibrium, the free energy can be expressed as 

T=~\ogZ . (3-8) 



Note that from Lagrangian duality the Boltzmann distribution (3-3) is also the solution to the 
problem of minimizing the energy for a given entropy. 

All mean thermodynamic quantities can now be derived directly from the free energy. In 
particular, the energy of the system is 
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while its thermodynamic entropy (disorder) is 

S = 0^ ■ (3-10) 
dp 

In addition to the above quantities we can use the free energy and the partition function to obtain 
the empirical joint distribution of the precoder input u and output x, which is defined for general 

/3 as 



45(^) = j^ E P B (x)Y,l{(xk,Uk) = &v)} ■ (3-11) 



xess u fe=i 



Eqs. (3-3 1 to (3-111 will be useful in deriving some of the results presented in the sequel. 

The rationale behind the introduction of the Boltzmann distribution is that as (3 — > oo, the 
partition function becomes dominated by the terms corresponding to the minimum energy. Hence, 
taking the logarithm and further normalizing with respect to /3, one gets the desired limiting 
quantity (energy, entropy or empirical distribution) at the minimum energy subspace of SS U . Note 
that even if the energy minimizing vector is not unique, or in fact even if the number of such 
vectors is exponential in K, one still gets the desired quantity when taking the limit f3 — > oo. 

It is crucial to point out that in the above summation over the set of state- vectors 3@ Ul both the 
input vector u and the matrix J are fixed. These random variables are called quenched. Therefore, 
all the above manipulations still do not alleviate the difficulty of calculating the desired quantities. 
In particular, the main difficulty comes from the free energy being a random variable itself, which 
depends on the particular realizations of J and u. As discussed in Section [I] the proofs of the 



self-averaging property of the SK-model in 33 ■ 36 can be generalized to apply to the form of H(x 



analyzed here. This means that the free energy converges in probability at the asymptotic limit 
to a non-random quantity, i.e., 

lim Yy{ - E{F}\ >e) =0 Ve > , (3-12) 

K— s-oo \K J 

where the expectation E{-} is over all realizations of J and u. As a result, all quantities that can 
be obtained from the free energy in an analytic manner, e.g., by differentiation of a parameter, are 
also self-averaging. The empirical joint distribution of the precoder input and output converges 



to a non-random distribution which is expressed by (3-16). This self-averaging property makes 
the problem more straightforward to tackle, since we may now hope to get analytic results for the 
average of the free energy and its derivatives. 

With that in mind, the limiting energy penalty (per symbol) can be represented as 



= lim — min x* Jx = — lim lim 

A -s-oo K xG^„ A -s-oo 0-yoo j3K 

= lim Mm E 



LiJ log £ «■ -•'"} 



(3-13) 
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To obtain the empirical joint distribution Px,u> we follow a technique very common in the physics 
literature (27], and introduce a dummy variable h G M, as well as the function 



K 



V(h,Z,v,x,u) = -hJ2l{(x k ,u k ) = {Z,v)} . (3-14) 

k=l 

If we add this term to H(x) in the exponent, the partition function gets modified to 

Z(h)= e-'WW+vW) (3-15) 

where we have dropped the explicit dependence of Z{h) and V{h) on v and £ (as well as u and x) 
for the sake of notational compactness. In the sequel, any dependence on h shall implicitly also 
indicate a dependence on v and £. Using the above partition function we obtain a modified free 



energy using (3-8 1. Upon differentiation with respect to h, setting h — 0, and letting ft — > oo we 
get 

Px,ufov) = J im p iS & w ) ( 3 " 16 ) 



K"->-c 

= lim — lim _B 



h=0 



(3-17) 



where ^(Z?, /i) denotes the free energy for the modified partition function Z(/i){^] 

The next step in the analysis is to invoke some underlying assumptions. The first assumption 
is that the random matrix J can be decomposed as 

J = UDU^ , (3-18) 



where D is a diagonal matrix with diagonal elements being the eigenvalues of J, and U is a unitary 
Haar distributed matrix |40| . It is further assumed that the empirical distribution of the diagonal 
elements of D converges to a nonrandom distribution uniquely characterized by its i?-transforrrj^] 
R(-), which is assumed to exist. 

Going back to the original communication system model, note that we are in fact interested 
in the normalized averages of most of the quantities described above, at the limit as K — > oo. 
Therefore, to make a distinction, while retaining the relation between the quantities, we shall use 

2 An alternati ve m ethod for deriving the limiting empirical distribution, which relics on the limiting moments, 
can be found in 130] , albeit with more restrictive assumptions on the limiting distribution. 
3 For a definition of the _R-transform, see Appendix E 
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henceforth the following notational convention 



^03) 4 lim 

K—t-oo K 

fitf) 4 lim 



(3-19) 
(3-20) 
(3-21) 



Calculating the expectation of a logarithm of a sum of exponents (see ( 3-13 ) ) is a formidable 



task. The standard approach in statistical physics is to invoke the so-called replica "trick" . The 
latter is based on the following identitjj^] 



£{logZ}= lim 

n->0+ 



log E{Z n } 



(3-22) 



which holds in general for real n. The "trick" here relies on the assumption that the right hand 



side (RHS) of (3-22) can be evaluated for integer n, and that the desired quantity can be found by 



analytic continuation in the vicinity of n = + . Although this "trick" does not a priori have any 
justified validity, its success in statistical physics, and more recently in communications theory, 
makes it a reasonable approach. Further assuming that the limits with respect to K and n can 



be interchanged (which is the common practice in replica analyses), (3-131 can be rewritten as 



S = lim £{$) 

0— >oo 



= — lim lim — lim 

/3->oo n->Q+ n K^oo pK 



= - lim i lim 1 lim — logE { Y e^"=i Ja 

P^roo [i n->0 n K-too K ^ 

= - lim ilim- lim -\ogEl Y e -^ J £S=i 

P-Hx> P n-¥0 n K-+oo K ^ 

where we use the notation J2{ x } = J2 Xl e^ ' ' ' e-ss ' an d Tr(-) denotes the trace operator. 



(3-23) 



The summation over the replicated precoder output vectors {a?a}" = i in (3-23) is performed by 
splitting the replicas into subshells, defined through an n x n matrix Q 



S(Q) = {jci, . . .,x n \x\xb = KQab} 



(3-24) 



The limit K — > oo allows us to perform the following derivations by saddle point integration. This 
first yields the following general result. 



Proposition 3.1 For any inverse temperature (3, any structure of Q consistent with (3-24), and 



4 An equivalent representation often encountered in the literature is E{log.Z} = lim + E ^ Z ^ — - 
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any R-transform R(-) such that R(Q) is well-define^ the energy is given by 



= lim - Ti[QR(-pQ)} 

n->0 n 



(3-25) 



where Q is the solution to the saddle point equation 



Q 



XXT 



t„-/3xtR(-/3Q)x 



y< e -/3xti?(-^Q)> 



dFu(u) 



(3-26) 



with denoting the n-fold Cartesian product of £$ u . 
Proof : See Appendix [Cl 



With the help of Proposition 3.1 the energy can be written as 



J2 xUxc-^ Jx 

D xti?(-/3Q)xe"' 3xti? ^' 3 Q) x 
1 /■ xS^S 

= hm — / 

n^onj J2 e"' 9xtii (-W) x 



di^(u) 



(3-27) 



(3-28) 



with Q given by (3-26). In (3-27), a; is a if-dimensional vector and its components represent 



users. The contributions of the users to the energy arise due to the inner product x^ Jx and are 



coupled, unless J is diagonal. In (3-28), x is an n-dimensional vector and its components represent 



replicas of the same user. The contributions of the users to the energy arise due to integration 
over the distribution Fjj{u) 1 and are decoupled and additive. This is just another incarnation of 



the decoupling principle that, under the assumption of replica symmetry, was addressed in 30 



Here, we find that it holds for the energy of general (also replica symmetry breaking) spin glass 
systems and their equivalents in communication theory. 



Another interesting observation is the following. In 41 , an analogy between the i?-transform 



and effective interference in linear MMSE detection was discovered, and the additivity of the 
effective interference of coupled users was explained based on the additivity of the i?-transforms 
of free random variables. Relying on the code symbols of different users {uk} being i.i.d., we can 



rewrite (3-28) as 



£ xiR(-/3Q)x.e-P* tR( >-PQ> 
1 * i xe^2 i 
<?(/3) = lim — V lim 

y ' K — K ^ n-s.fl 



K * — ' n->-o n 

k=l 



^2 e -^ xtfl (-w) x 

xeS" 



(3-29) 



MP 



5 Note that if R(-) has a series expansion, R(Q) is well-defined. Since R(-) is the free cumulant generating 
function, R(Q) is well-defined, if all moments of the asymptotic eigenvalue distribution of J exist. 
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and interpret £k(/3) as the effective energy of user fc. Like the effective interference in 41 , it 



depends only on the signal constellation of user fc and the i?-transform, and it is additive among 



users. In contrast to 41 , (3-291 is more general and neither constrained to linear detectors nor to 
Gaussian symbol alphabets. 

To produce explicit results, note that the limit n — > of the n x n matrix Q can only be defined 
imposing a certain structure onto the crosscorrelation matrix Q at the saddle point, unless the 



summations in (3-26) can be evaluated explicitly, e.g., for @S U = C. The simplest structure is that 



of replica symmetry, which in the current setting boils down to 

Q = <Zolnxn + '-Jj-Inxn , (3-30) 

for some constants {qq, xo}- The 1RSB assumption leads to a more involved structure, formulated 
as 

Q = qilnxn +PlInP x !kP ® 1« + -3"J r nxn i (3-31) 



Ml Ml 



3 ~ p 



using the constants {q\,P\, Xii Ml}- The above constants (i.e., {(ftbXo} f° r RS, and {qi,P\, Xi, Mi} 
for 1RSB) are referred to as macroscopic parameters, and obtained from the corresponding saddle 
point equations. The limiting energy penalty can then be expressed in terms of these macroscopic 
parameters, as shown in the following sections. An analogous procedure can be employed to obtain 



the limiting empirical joint distribution of the precoder input and output using (3-16). 

Replica symmetry breaking is not limited to one step, and in fact in order to exactly characterize 
the limiting energy penalty and precoder output statistics, we would eventually need to consider 
full RSB, as discussed in Section[T] However, we will only present here precoding results up to the 
accuracy of 1RSB for purposes of analytical tractability. For the interested reader and the sake of 
completeness, we include general results on multiple-step RSB in Appendix |A} 



4 Limiting Characterization of the Precoder Output 

We restrict ourselves in the following to 1RSB analysis of the limiting characteristics of the precoder 
output. As demonstrated in the sequel, when compared to simulation results at finite numbers of 
antennas, 1RSB gives quite accurate approximations for the quantities of interest, while the RS 
ansatz does so only in special cases. 



4.1 The 1RSB Solution 



Applying the 1RSB ansatz, as outlined in Section [3] (see in particular (3-31)), the limiting 
properties of the precoder output are characterized by means of four macroscopic parameters 
qi,Pii XijMi € (0, oo), which are determined as specified below. Let J be a K x K random matrix 



satisfying the decomposability property (3-18), and let R(-) denote the i?-transform of its limiting 
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eigenvalue distribution. Consider now the following function of complex arguments 

X{y :Z )^e-" 1 ^ ei ^- m{x{hZ ' +9lV ' )} , b) £ C 2 , (4-1) 
where 3?{-} takes the real part of the argument, and the parameters e\, g\ and /i are defined as 

ex = R(-Xi) , (4-2) 



91 = v * ' ( } 



/i = VliR'(-Xi-HiPi) ■ (4-4) 
Furthermore, denote its normalized version by 

X(y,z)= f Yr Z L- ^ 

Jc My,z)dy 

to compact notation. Then, using the shortened notations (for z re , z\ m £ K) 

oo oo 
/* f ~\ z \ 

(•)Dz= / / (•) — - — dz re dz im , z = z re + jz\ m £ C , (4-6) 



-OO — OO 



and 

/ (-)DyD^ f [ (•) DyDz, (4-7) 

JC 2 JC JC 

the parameters {<7i,Pi, Xi> fJ>i} are given by the solutions to the four coupled equation^ 

Xi + PlMi = -r (( ^ \z*argva\n\f 1 z + giy - eix\\ \(y, z)DyDzdFu(u) , (4-8) 

Jl J JC 2 I i£«„ J 

Xi + (31 +Pi)m = — // U\y*Bxgpnn\fiz + giy-exx\\x(y,z)T>yDzdFu(u) , (4-9) 

3l JJC 2 I 16* J 



9i + Pi 

and 

Xl+jUlPl 



a,igmm\fiZ + giy - eix\ 



5 u (y,z)D 2 /DzdF [/ ( M ) , (4-10) 



j R(-w)dw = JJ^log \Jx(y,z)Dy\ DzdF v (u) 



XI 



- 2xii?(-xi) + + 2 Xi + 2/xi^i)-R(-Xi - M1P1) 

- 2/iigi(xi + /iiPi)-R'(-Xi - ■ (4-H) 

The limiting properties of the precoder outputs can now be summarized by means of the fol- 



6 In general these coupled equations have multiple solutions and one needs to choose the solution that minimizes 
the energy penalty. 
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lowing two propositions. The detailed proofs are provided in Appendices |B . 1| and |B .2[ respectively. 



Proposition 4.1 Suppose the random matrix J satisfies the decomposability property (3-181 



Then under some technical assumptions, including in particular one-step replica symmetry break- 
ing, the effective energy penalty per symbol S tot jK converges in probability as K,N — > oo, 
K/N -> a < oo, to 

Srsbi - \ Qi +Pi + — ) R(~Xi - - — -R(-Xi) - QiiXi + ^xPi)R\-Xi - M1P1) ■ ( 4 " 12 ) 
V Ml/ Mi 

The conditional limiting empirical distribution of the precoder's outputs is specified next. 



Proposition 4.2 With the same underlying assumptions as in Proposition ^. 1\ the limiting con- 
ditional empirical distribution of the nonlinear precoder's outputs given an input symbol u satisfies 



Px\u{Z\v)= / l{ £ = argmin|/i^ + g x y - e\x\ }X{y,z)DyT>z , 



(4-13) 



where !{•} denotes the indicator function. 



4.2 A Replica Symmetric Reduction 

Although the 1RSB solution of the replica analysis leads in principle to a more accurate descrip- 
tion of the large system limit, corresponding results can also be derived using the simplifying 



assumption that the system exhibits a replica symmetric behavior (see (3-30)). These results shall 
be used in the sequel to demonstrate the impact of replica symmetry breaking. However they can 
also be extremely useful for more conveniently analyzing settings that do exhibit replica symmetric 



properties, such as the case of convex extended alphabet sets addressed in 25 . A convex alphabet 
example is discussed in Section [7] 



The limiting energy penalty under the RS assumption was in fact already derived in 24 



and the result is recalled in the following proposition. The result is given in terms of the two 
macroscopic parameters <?o>Xo G (0, oo), which are obtained through the solution of the two 
coupled equations 

R{-Xo)x 



argmm 



and 



/»{/« 



Xo 



c argmm 



VloR'(-Xa) 



R(-Xa) g 



y/qoR'(-Xo) 



DzdFu(u) , 



Dz}dFu(u) 



\/qoR'(-Xo) 



(4-14) 



(4-15) 



Proposition 4.3 (|24|, Proposition 1) Suppose the random matrix J satisfies the decompos- 
ability property ( |3-18 1 . Then under some technical assumptions, including in particular replica 
symmetry, the effective energy penalty per symbol S tot / K converges in probability as K, N — > oo, 
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K/N -> a < oo, t^\ 

S rs = qo[R(-Xo) - XoR'(-Xo)] ■ (4-16) 
The limiting conditional distribution of the precoder outputs can also be characterized under 



the RS assumption, in an analogous manner to Proposition 4.2 



Proposition 4.4 With the same underlying assumptions as in Proposition 4-3 the limiting con- 
ditional empirical distribution of the nonlinear precoder' s outputs given an input symbol u satisfies 



Px\u{t\v) 



IA 


£ = argmin 







R(~Xo)x 



Dz . 



(4-17) 



This is the measure of the corresponding Voronoi region in the scaled conditional signal constel- 
lation 3S V , with respect to the (complex) Gaussian probability measure. 



Proof : The proof follows the same steps as in the proof of Proposition 4.2 while replacing ( 3-31 1 



with (3-30) 



4.3 Zero-Temperature Entropy 

One way to demonstrate the degree of consistency of the RS and 1RSB solutions is to look at 
their limiting (thermodynamic) zero-temperature entropy defined as ,y = lim^oo S fi {f3). It can 
also be obtained in a manner similar to Propositions |4.1| and |4.3| In Appendix |B.3| we show: 



the limit- 



Proposition 4.5 With the same underlying assumptions as in Propositions ^ - 1] or 
ing entropy per symbol converges to 



y = xR(-x) 



R(—w) dw 



(4-18) 



with x denoting xi an d Xo f or 1RSB and RS, respectively. 

Proposition |4.5| gives rise to the conjecture that the entropy for general r-step RSB is given by 



X r R(— Xr) ~ Jo r R{~ w )dw (see (A-l) for the definition of the macroscopic parameters in the 
general setting). 

In any stable thermodynamic system the entropy is non-negative for all temperatures. However, 
one of the main pitfalls of the RS solution of the original SK-model is that its zero-temperature 



entropy is negative, indicating an instability 26 . For all i?-transforms that are strictly increasing 



functions of negative real arguments, Proposition 4.5 clearly implies that the entropy is always 
negative, becoming zero only when the zero temperature value of Xii respectively xo, approaches 
zero. While the full RSB solution has been shown to have vanishing entropy at zero temperature 
and corresponds to the correct solution, the following lemma proven in Appendix [Ej indicates that 
negative entropy is a rather common effect for finite RSB steps. 



''In [24], the self-averaging property was stated as an assumption, since the authors were not aware of [34]. 



15 



Lemma 4.6 The R-transform, wherever its derivative with respect to a real argument exists, is 
an increasing function. If the probability distribution is different from a single mass point, the 
R-transform is strictly increasing. 

Note that the above argument for the entropy holds only for discrete state variables. In the 
case of continuous alphabets, the (then differential) entropy of a system can in fact be negative. 
Therefore, a negative zero-temperature entropy is not an alarm bell per se. For discrete state 
variables, the zero-temperature entropy serves as a measure of accuracy: the closer it is to zero, 
the better the approximation. 



5 Zero-Forcing Front-End 

To gain more insight into the impact on system performance of the nonlinear precoding scheme 
under investigation, we now particularize to a specific linear front-end, namely the ZF front-end. 
The precoding matrix T in this case is given by the pseudo-inverse of the channel transfer matrix, 
which we write here as 

T = H+ = MmH ] {HH ] +el)~ . (5-1) 

The underlying assumptions are that N > K and that the matrix HH^ is almost surely (a.s.) 
positive definite]^] Focusing on the asymptotic regime for K/N — > a < 1, then using (2-1), (2-3), 



and Proposition 4.1 the equivalent single- user channel observed by user i is 

n^Xi+fii, #>1, (5-2) 
where hi is a zero mean circularly symmetric complex Gaussian noise with variance = , 

P = -s— (5-3) 

Srsbl 



denotes the effective received SNR, and <? r5 bi is given by (4-12) 



Proposition 5.1 Employing the same underlying assumptions as in Proposition \4-l[ then with 
a ZF front-end the channel observed by a randomly chosen user is equivalent in the large system 
limit to a concatenated single-user channel, with input u £ % , intermediate output x € SS U , and 
final output y € C, specified by the Markov chain u-x^y as shown in Figure^ This Markov chain 
is defined by the following joint probability density function 

fuxr(u,x,y) = fu(u)f x \u(x\u)f Y \x(y\x) , (5-4) 



3 In Section |s] we will also allow for N < K following the treatment in |42| 
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Equivalent Single User Channel Model 



Single User 




Intermediate 




Single User 


Channel Input 




Output 




Channel Output 




fx\u(x\u) 




fv\x{y\x) 











Figure 2: Schematic description of the equivalent single user channel model for a ZF front-end. 



where 



fx\u{x\ u ) = ^2 Px\u(x\u)6(x - x) 



(5-5) 



with Px\u( x \ u ) given by (4-13), and 

fy\x(y\x) 

is the (complex) Gaussian density with mean x and variance 1/p. 



P e -\y-*\ 2 P 

7T 



(5-6) 



Proof : The Proposition follows straightforwardly from Proposition 4.2 and (5-2 1 



Note that the RS reduction of the above result is readily obtained from Propositions 4.3 and 4.4 
by replacing <? rs bi in (5-3) with <f rs of (4-16), and taking (4-17) for P x \u(%\ u ) m (5-5). 



The achievable throughput of the nonlinear precoding scheme can be derived from the equiv- 
alent single-user channel model using Proposition |5.1| Accordingly, the achievable rate of a ran- 
domly chosen user is given by the mutual information^] between the input u and received signal y, 
i.e., 

R = l(u;y)=h(y)-h(y\u) , (5-7) 
where h(-) and h(-|-) denote differential entropy and conditional differential entropy, respectively 



(which can be readily calculated using Proposition 5.1 ). The normalized spectral efficiency is then 
given by 

C^—R ► aR , (5-8) 



and it is functionally dependent on the system average E^/Nq through the relation 43 

snr = — C — 

a N 



(5-9) 



To get a better insight into the impact of the nonlinear precoding scheme, it is useful to 
compare the results to the spectral efficiency of DPC with Gaussian input (specifying the ultimate 
performance), as well as to the spectral efficiency of linear ZF (for both Gaussian and discrete 



9 Note that in the large-system limit the receivers only need information about the state of their own channel, 
but not about the states of the other channels due to the self-averaging property which makes the impact of the 
other users' channels and data deterministic. 
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alphabet input). Another interesting comparison is to the spectral efficiency of generalized THP 
(GTHP), which is a popular practical nonlinear precoding alternative to the scheme considered 
here (see, e.g., [20]). For the sake of comparison we further particularize henceforth to the case 
in which the entries of the channel transfer matrix H are i.i.d. zero-mean circularly symmetric 
complex Gaussian random variables, with variance 1/N ("a Gaussian if"). Note that in this case 
the ii-transform of the limiting eigenvalue distribution of the random matrix J = (HH^)^ 1 , and 



its derivative, simplifiy to 24 



R(w) 



R'(w) = 



1 — a — y/(Y— a) 2 — 4aw 



2aw 



(l - a - - a) 2 - Aaw^j 



(5-10) 
(5-11) 



4aw 2 y/ (1 — a) 2 — Aaw 

Starting with DPC, the limiting spectral efficiency in this setting coincides with the corre 



sponding spectral efficiency of the dual uplink channel with uniform power distribution 43 . This 



follows from the limiting conclusion in 44 , and by observing that the optimization problem over 



diagonal input covariance matrices, that specifies the maximum achievable sum-rate (see [lj and 



references therein), is solved by a uniform power distribution 45 . The spectral efficiency of DPC 
is hence given by [43] 



C dpc (snr) =alog 2 ^1 + snr - ij(snr,a)^ + log 2 ^1 + asnr- ^(snr, a)^j 



^J(snr,a) 
4snr 



where ^(snr, a) is defined as 46 



^(snr, a) = I y snr (1 



1 - Vsnr(l - yfa) 2 + 1 



(5-12) 



(5-13) 



Regarding linear ZF, we restrict the discussion to the case in which the active user population 
can only be controlled through the system load a, as is in fact assumed for the nonlinear precoding 
scheme (see also the discussion in Section [8|. In this setting, as shown, e.g., in [47p8 , the induced 



precoding efficiency (2-7 1 (equivalent here to the inverse multiuser efficiency) converges in the large 
system limit to 

Czf=^— , (5-14) 
1 — a 



and again for Gaussian input the spectral efficiency coincides with the corresponding result in 48 



(see also 43 46 49 ) 



C zf (snr) = a log 2 (1 + (1 - a) snr) . (5-15) 
The corresponding spectral efficiency with discrete input alphabet can be derived, e.g., following 



the guidelines in 50 . Considering the particular case of binary phase shift keying (BPSK) input, 



IS 



one obtains 

The spectral efficiency of linear ZF precoding combined with QPSK input is obtained via the 



relation 50 



C zf ' qpsk (snr) = 2C zf ' bpsk (^) , (5-17) 

yielding (through C zf ' qpsk (|^) = 2C* zf < bpsk (j?A. The spectral efficiency of GTHP for the 

corresponding setting is derived in Appendix [F) 



6 Lattice Precoding: An RSB Example 



Adhering to 24 , we consider in the following a particular example of a discrete relaxed alphabet set 
for QPSK signaling, which exhibits replica symmetry breaking. The original QPSK constellation 
alphabet is represented by the set 

W = {1+ -j,l- 3} . (6-1) 

and quadrature symmetric transmissions are assumed (note that the above definition induces 

<t^ = 2). The relaxed alphabets in this particular example can be represented as points from the 
extended lattice 

7V 

M u = f(4Z + 1) X (4Z + 1)) , VmG* . (6-2) 

1 + J 

More specifically, we take 

@±i±j = ±{ci,c 2 ,...,c L }± j{ci,c 2 ,...,c L } , (6-3) 

where it is assumed that — oo = c$ < c± < ■ ■ ■ < cl < c^+i = oo. The parameter L thus specifies 
the number of lattice points used in the extended alphabet in each dimension, and we particularize 
here to the set {+1, —3, +5, —7, +9, . . . }. The alphabet relaxation scheme is depicted in Figure[3j 
Due to the complete quadrature symmetry of this setting, all QPSK constellation points and their 
corresponding relaxed alphabet subsets are completely equivalent, and we focus in the following, 
for notational convenience, on the QPSK constellation point represented by u = l+j, and £%i+j. 



The first step in the analysis is to rewrite ( 4-8 H 4-11 1 and obtain the four macroscopic pa- 



rameters {q'ijPijXIjMi} f° r t ne current example. Denoting the real and imaginary parts of an 
arbitrary point s G C as s re = 5R{s} and s\ m = ^s{s}, the Voronoi region of the lattice point 
x = c m + JCn is the region in the complex plane for which 

l) , Sim G (v n ,v n+1 ) , (6-4) 
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'wo Dimensional Lattice Relaxation for QPSK 
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Figure 3: Two dimensional lattice based relaxation for QPSK input. 



where the boundaries of the Voronoi regions are |i>i = Ci+ ^*~ 1 \. Now considering (4-1), recall 
that for any given y, z G C the lattice point that maximizes the exponent therein is given by 
argmin \f\z + giy — e\x\. This implies that a lattice point x = c m + jc n is the solution to the 
above minimization problem whenever 



),'0n+l(^im)) , 



(6-5) 



where we introduced the real argument function 



%(£) = 

9i 



(6-6) 



Applying this observation to (4-8)-(4-ll ), and exploiting the quadrature symmetry property, 



the derivation simplifies considerably by noticing that the inner integrals therein can be represented 



as sums of separate integrals over the regions specified by (6-5). Accordingly, consider the two 
real argument functions 



, (6-7) 



i& k (g) = 1 e MiCfe[(Migf-ei)cfc+2/ig] 

2\/tt 



e -Ofc(£)-Mi9iCfc)) _ e -(^fe+iCO-jiiifficfe)) 



(6-8) 



Then, following some tedious algebra, it can be shown from (4-8)— (4-11) that the parameters 
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{QIiPIj Xi> Ml} are the solutions to the coupled equations 



9i 



Pi 



Xi 



= 2 



Eti©fc(e) 



Pi 



/1M1 
2 
.91 



E 



Em=l C ?" e m(gh -e* Xl 
L 



Efe=i©fe(0 



(6-9) 
(6-10) 
(6-11) 



and 



Mi = [29i (Xi 



li 1 px)R!{-Xi - HiPi 



R(-w) dw - 2[i x xia\ + Mi(?i + 2pi)i?(-Xi - MiPi) 



XL 



(6-12) 



The corresponding energy penalty is obtained by plugging the four solutions into (4-12). Ap 



plying the same approach to Proposition |4.2| the limiting conditional probability of the precoder 
output being x = c rn + jc n € is given by 



Pr {x 



o\u=l+j} 



e m (0 



di 



e„(0 



dC 



= Pr{K{a;} =c m |u = l+i}-Pr{S{x}=c n |u=l+j} . 

(6-13) 

The limiting conditional probabilities that correspond to the rest of the QPSK constellation points 



are readily obtained from (6-131 by symmetry considerations. Note also that (6-13) implies that 



the real part and the imaginary part of the precoder output x behave as independent random 
variables. 

Numerical results for the limiting energy penalty of the discrete lattice relaxation scheme are 
plotted in Figure |4j The figure shows the limiting energy penalty (in dB) as a function of the 
system load a, for the particular case of a Gaussian H and a ZF front-end. Since a\ — 2, the 



corresponding precoding efficiency ( 2-7 ) can be immediately obtained by subtracting 3dB from 



the energy penalty shown in the figure. The results in Figure [4] correspond to alphabet relaxations 
with L — 2 and L = 3. Note that the two curves are essentially indistinguishable and the energy 
penalty with L = 2 becomes only negligibly larger as a gets close to unity. This implies that 
increasing L beyond 2 in this setting provides diminishing returns. Empirical energy penalties 
obtained through Monte Carlo simulations are also included in the figure. The results are for 
systems in which the number of users is fixed to K = 8, K — 16, and K = 32 (averaged over 10 4 , 
10 3 , and 10 2 channel realizations, respectively). The energy penalty is shown to decrease with 
the system size, and the simulation results exhibit a good match to the limiting energy penalty 
predicted by the 1RSB replica analysis. The lower bound for the limiting energy penalty obtained 



in 21 is also plotted in this figure which, with appropriate scaling to match the current setting, 
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Figure 4: The energy penalty per symbol, as a function of the system load a, for the two dimensional 
discrete alphabet relaxation scheme for QPSK input. 



is given by 



ir 



(6-14) 



Figure [4] shows that the 1RSB prediction approaches the lower bound as the load approaches 
unity. Note however that the 1RSB result stays strictly higher than the lower bound. In fact, a 
careful numerical examination of the limiting 1RSB energy penalty at a = 1 shows that it hits 
the value of 7.0744 dB for L > 4, while the lower bound in this case is 15 « 7.0697 dB. The 
numerical analysis of the limiting energy penalty is considerably simplified in this region of a by 
the (numerical) observation that the macroscopic parameter \i approaches as a — > 1 (although 
it stays strictly positive). The small xi approximation of the equations employed to calculate 
the limiting 1RSB energy penalty is shortly discussed in Appendix [D] The RSB phenomena is 
demonstrated by considering the limiting energy penalty obtained via the RS approximation, as 
stated by Proposition |4.3| (the explicit expression for the current example is given in [24 1 Eq. 
(26)] ) . As shown in Figure [4j the RS approximation fails to predict the limiting energy penalty 



for a > 0.3, and in fact it even violates the lower bound (6-14) for a > 0.55. 

The better accuracy of 1RSB is also visible looking at the zero-temperature entropy. We can 
analytically evaluate Proposition |4.5| in the case of a Gaussian H, which becomes 



1 - a - - a) 2 + Aax 



2a 



1-a 1 

log - 



2(1 - a) 



(6-15) 
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Figure 5: Zero temperature entropy of the RS solution (solid), and 1RSB solution (dashed), as a 



function of the system load a (corresponding to (6-15 1 ) 



The entropy for both the RS and 1RSB approximations for a relaxation level L = 2 are shown 
in Figure [5] Although the 1RSB solut ion of the above model also has negative zero-temperature 
entropy, it is much closer to zero, corresponding to a much weaker instability, and approaches zero 
as a — » 1. In contrast, the RS entropy drifts away from zero as a — > 1. 

The limiting conditional probabilities of (6-131 are plotted in Figure[6j as well as the empirical 
conditional probabilities, based on the Monte Carlo simulations employed to produce the energy 
penalties of Figure |4j The results correspond to a relaxation level of L — 2, and focus on the real 
part of the extended alphabet points, given that the real part of the original QPSK constellation 



point satisfies {u} = 1 (recall the decoupling of the real and imaginary parts implied by (6-13)). 
The simulation results exhibit again a good match to the limiting analytical 1RSB prediction. It 
is also clearly demonstrated that, when the system load a is low, hardly any relaxation is required, 
while the probability of using symbols from the extended alphabet set increases as a approaches 
unity. 



7 Convex Precoding: An RS Example 



This section is devoted to another alphabet relaxation scheme, also introduced in 24 for QPSK 



signaling. The key feature of this relaxation scheme is that the extended alphabet set is continuous 
and convex, allowing for an efficient solution to the corresponding quadratic programming problem 
of minimizing the energy penalty. Convex optimization problems are generally believed not to 
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Figure 6: Conditional probabilities of the real part of the precoder output for the two-dimensional 
discrete lattice relaxation scheme, given that 3? {it} = 1. 



exhibit replica symmetry breaking 51 . In certain special cases this has been shown explicitly 
52 . Furthermore, as will be demonstrated in the sequel, the replica symmetric solution for this 
alphabet relaxation scheme agrees well with numerical simulations and thus considerably simplifies 
the analysis of the limiting regime. 
Denoting 

= {z G C : U{z} > 1, %{z} > 1} , (7-1) 
the relaxed alphabet subsets are defined by 

u 

M u = —^i +3 , ue{l+j,-l+j,-l-j,l-j} • (7-2) 

The alphabet relaxation scheme is depicted in Figure [7j and it is referred to henceforth as convex 
relaxation for QPSK (CR-QPSK). 

The RS approximation for the limiting energy penalty with the CR-QPSK relaxation scheme 
is obtained through Proposition |4.3[ and it is given by the solution to the following fixed point 



equation 24 Eq. (30)] 



2 + (a-l <? + 

Q[\l^\= " " J-^ ■ (7-3) 

2 + a£ 




Note that (7-3 1 yields finite energy penalties for all loads < a < 2. Although loads greater than 
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CR-QPSK 
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-10-8-6-4-2 2 4 6 8 10 



Figure 7: Extended alphabet sets for the convex relaxation precoding scheme for QPSK signaling. 



unity imply that the matrix HH^ in (5-1) is singular, this does not lead to interference at the 



receivers in the large system limit, as shown rigorously in 42 



Numerical results for the limiting energy penalty of CR-QPSK arc plotted in Figure [8] Empir- 
ical results based on Monte Carlo simulations are provided as well. These results were obtained by 
fixing the number of users to K = 32, and averaging over 1000 channel realizations. The results 
exhibit an excellent match to the limiting RS analytical results, thus supporting the validity of the 
RS approximation. The corresponding results for the discrete lattice-based alphabet relaxation 
scheme of Section [6] are also provided for the sake of comparison, and it is clearly observed that 
in terms of the limiting energy penalty, the discrete scheme is superior to the CR-QPSK scheme 
for all a € (0, 1]. The limiting energy penalty difference approaches its maximum value of 2.41 dB 
at a = 1. As will be shown in Section |8j however, the comparison becomes more subtle when 
spectral efficiency is investigated. 

The RS approximation of the limiting conditional distribution of the precoder outputs is ob- 
tained using Proposition |4.4| The idea here is to start from a discretized version of the continuous 
CR-QPSK relaxed alphabet set, and obtain the limiting conditional distribution of each relaxed 



alphabet point using (4-17). The final step is then to take the limit as the areas of the Voronoi 
cells corresponding to each such point vanish. Using this approach, while restricting the discussion 
to a Gaussian H and focusing for convenience on the QPSK constellation point u = one 



25 



Energy Penalty Comparison (QPSK) 



1RSB Solution - L=2 

* Discrete Lattice (L=2) - Simulation Results: K=32 
CR-QPSK - RS Solution 

♦ CR-QPSK - Simulation Results: K=32 




Figure 8: The energy penalty per symbol as a function of the system load a, for the CR-QPSK 
alphabet relaxation scheme. Corresponding results for the discrete alphabet relaxation scheme of 
Section [6] are provided for comparison. 



gets the corresponding conditional probability density function (pdf ) 



f%}?*(x\u = l+j) = Qi6(x re - l)6(x im - 1) 



+ Qi 
+ Qi 



l 



1 

1 



e oj»-qp S k S(x re — l)U(x\ m - 1) 

x 2 

e ~ p* 8{x m - l)U{x re - 1) 



(7-4) 



+ ^oJ^qpsk 6 ° ,#CM "" k W(x re ~ l)U{x m ~ 1) , x re , x im e K , 
where we decompose the complex argument as x = x re +jx\ m , U(x) denotes the unit step function, 



^cr-qpsk (jgno^gg the limiting energy penalty of the CR-QPSK scheme obtained from (7-3 1, and the 
constant Q\ is defined as 

~~2 \ 



Qx = Q 



Q-^cr-qpsk 



(7-5) 



The conditional pdf given the rest of the QPSK constellation points (i.e., u E {— 1 + j, — 1 — 
j, 1 — j}) is obtained in an analogous manner, while considering the full symmetry of the extended 
constellation. 



Returning to (7-4), note that this pdf contains masses on the boundaries of and in 



particular a mass point at the original QPSK constellation point (i.e., x = 1 + j). Plots that 
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Figure 9: The limiting conditional distribution of the precoder output for the CR-QPSK scheme, 
given that u = 1 + j. 



demonstrate this behavior of the pdf as a function of a are provided in Figure [9] The upper left 
plot shows the weight of the mass point at x = l+j, as a function of a (corresponding to Q\ ). The 
lower left plot shows the pdf mass on the lower boundary of the extended alphabet subset 
(i.e., when the imaginary part of the precoder's output is fixed to x\ m = j). The plots on the right 
show the pdf on the interior of for a = 0.1 (upper right) and a = 0.9 (lower right). The 

increase in probability of using extended alphabet points as the system load increases, is clearly 
demonstrated in the figure. 

Additional numerical results comparing the analytical RS approximation for the pdf to empir- 
ical simulation results are shown in the upper left plot of Figure [9] and in Figure [TUJ The upper 
left plot of Figure [9] compares the probability mass at x = 1 + j to corresponding simulation 
results for K — 32 (averaged over 1000 channel realizations). The corresponding comparison for 



the cumulative distribution function (CDF) of x re , given that 3?{u} = 1, is shown in Figure 10 



The left plot shows the CDF for a = 0.7442 (i.e., for N = 43), while the right plot shows the 
results for unit load. As observed, all empirical results exhibit a very good match to the analytical 
RS approximation, further supporting the validity of the RS analysis for the CR-QPSK scheme. 
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CDF for a=0.7442 
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Figure 10: CDF of the real part of the precoder's output for the CR-QPSK scheme, given that 
3t{u} = l. 



8 Spectral Efficiency Comparison 

The two previous sections focused on the transmitting end of the system, and investigated the lim- 
iting behavior of the precoder output while employing two particular alphabet relaxation schemes. 
In the following we turn to investigate the limiting behavior of the system as a whole, by con- 
sidering the normalized spectral efficiency in view of the analysis of Section [5j Accordingly, we 
restrict the discussion to a ZF front-end and a Gaussian H, and apply Proposition |5 . 1 1 to obtain 
the spectral efficiencies of the discrete lattice-based alphabet relaxation scheme and of CR-QPSK. 



Starting with the discrete scheme, the spectral efficiency is obtained by incorporating (4-12 1 



and (6-13) into (5-4)-(5-8). The observation made in Section [6] regarding the independence of 



the real and imaginary parts of the precoder's output, leads to the following conclusion. The 



achievable rate in (5-7 1 for QPSK input can be obtained by treating QPSK signaling as two 



independent corresponding BPSK signaling settings. Accordingly, the conditional precoder output 



probabilities, given a real BPSK input of u = 1, are given by (cf. (6-13l) 



e*(0 



d? 



where we set 8§\ = {cfe} fe=1 , and the conditional probabilities given u = — 1 can be immediately 
obtained from symmetry considerations. It is then straightforward to show that the corresponding 
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spectral efficiency is given by 



C bpsk (p) = 



d£ 



(8-2) 



The spectral efficiency with QPSK input is then obtained through the relation C qpsk (^" 
2C* bpsk (||), while substituting 



«4sbi 



(8-3) 



Turning to the CR-QPSK scheme, and applying Proposition 5.1 it can be shown that the 
conditional pdf of the equivalent single user channel output y, given an input u, is equal tcP] 



f^ k (y\u = ±l±j) 



1 



1 



Q 2 (±y re )e i+p°^ psk +Qie 



Q 2 (±y im )e i+ P =^-> sk +Q ie 



-(y re Tl) 2 P 



-(y im =Fi) 2 P 



(8-4) 



where we decomposed the complex argument as y = y re + jy,m, and the real argument function 
Q 2 (£) is defined as 



Qa(0 = Q 



2 pa<f cr - qp5k (l - + 1 



QI#cr-qpsk ^! + pa< #cr-qpsk ^ 

The marginal distribution of the equivalent single user channel output y is given by 



(8-5) 



2Vn 



Q2(Vre) + Q2(~Vre) c - 

Q2{y\m) + Q2{-y\w) ( 



i+pa6 scr -qp sk 



Qi (■ 



( , (Vr.-l)> +e -(s/r.+l) / p 



i +pQ '3cr-qp 5 k _|_ Q i | c -(y im -l) 2 p _|_ e -(y im + l) 2 p 



(8-6) 



Finally, following (5-8 1 and accounting for the inherent symmetry in (8-4) and (8-6), the spectral 



efficiency of the CR-QPSK scheme is given by 



C crqpsk (p) =2a 1- 



\fl + pa£"-w sk 



Q 2 (s)e i+ P =Vr-qp 5 k + g lC -(«-i) 2 p 



log 2 [ 1+ Q'(-°)* -^f + ^gggg^ xe^ | ^ | (8 _ ?) 
Q 2 (s)e !+p°*' r - qpsk + VTTpo^-qp^Qie-^- 1 ) 2 ' 



Comparative numerical spectral efficiency results are plotted in Figure 11 The figure shows 



10 In the following notation the ± signs are designated with adherence to the corresponding signs of the real 
and imaginary parts of u. For example, for u = 1 + j one should substitute Q2(yre), e — ' a,e_1 ' p , Q2(j/im)i an d 
e —(vim — 1) P in the corresponding terms in |8-4t. 
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Spectral Efficiency (Optimum a) 



DPC (o=1) 
DPC (a-«o) 

- Linear ZF - Gaussian Input 
Linear ZF - QPSK Input 

- Nonlinear Precoding - QPSK Input 
Nonlinear Precoding - QPSK Input (CR) 

-GTHP-QPSK 




E /N [dB] 



Figure 11: Spectral efficiency results optimized with respect to the load a. 



the spectral efficiencies of the discrete extended alphabet relaxation scheme (while taking L = 
2), and of the CR-QPSK scheme, as well as the spectral efficiency of linear ZF precoding for 



Gaussian and QPSK input (see (5-151 and (5-17), respectively), and the spectral efficiency of 



GTHP with QPSK input (following (F-32)-(F-33l). The spectral efficiencies were evaluated for 



the optimum choice of the system load a. The optimum load is a function of |p- and shown 



in Figure 12 In Figure 11 the DPC spectral efficiency (5-12) is also provided for comparison, 



evaluated both for a = 1, and for a — > oo (specifying the ultimate performance). The optimization 
with respect to a emphasizes its role as a crucial system design parameter, facilitating the proper 
working point for each transmission scheme, per each It also naturally translates to a practical 
scheduling scheme, specifying the desired number of simultaneously active scheduled users per 
transmit antenna (see, e.g., [45]). 

The results indicate that nonlinear precoding can provide significant performance enhancement 
for medium to high ^ values. The discrete lattice-based relaxation scheme is shown to outperform 
linear ZF with QPSK input for > 3.43 dB. The beneficial effect of the lattice relaxation 
scheme becomes more pronounced, the more the spectral efficiency approaches the upper limit of 
2 bits/scc/Hz per transmit antenna. For example, a spectral efficiency of 1.75 bits/sec/Hz can be 
obtained with lattice relaxation already at ~ 7.66 dB, whereas linear ZF requires additional 
7.26 dB for the same spectral efficiency. In fact, the QPSK-based lattice precoding scheme is 
shown to marginally outperform linear ZF with Gaussian input for 4.19 dB < j^- < 7.26 dB. 
The lattice relaxation scheme also outperforms GTHP for all values, becoming more effective 
for medium to high ^ (for example, GTHP needs 2.93 dB more energy per bit to achieve 1.75 
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Optimum a 



■ Linear ZF - Gaussian Input 

■ Linear ZF - QPSK Input 
-Nonlinear Precoding - QPSK Input 

Nonlinear Precoding - QPSK Input (CR) 
-GTHP-QPSK 




4 6 

E b /N [dB] 



Figure 12: System load that maximizes spectral efficiency as a function of 



jv • 



bits/sec/HzQ The gap from the DPC upper bound is, however, still essentially retained (4.49 dB 
at 1.75 bits/sec/Hz, considering DPC with a — 1, to make a fairer comparison). 



As for the CR-QPSK scheme, Figure 11 shows that it also provides a considerable performance 



enhancement over linear ZF with QPSK input. It is outperformed by the lattice relaxation scheme 
for 4.38 dB < |t < 9.40 dB. It performs better at low values of jf- , and in fact it even negligibly 
outperforms linear ZF with Gaussian input in the low region. Moreover, unlike the discrete 
scheme, CR-QPSK outperforms linear ZF precoding (with QPSK input) for all values. Fur- 
thermore, it outperforms lattice relaxation in the high ^ region, since it allows for loads up to 
a < 2 and therefore its spectral efficiency is no longer upper bounded by 2 bits/sec/Hz, but rather 
by 4 bits/sec/Hz per transmit antenna. Though, the convergence to the limiting spectral efficiency 
of 4 bits/s/Hz at high ^ is rather slow. CR-QPSK also outperforms GTHP for all values, but 
the advantage is more significant for high where overloading is employed. These results are 
of particular interest since the CR-QPSK scheme lends itself to efficient implementation, whereas 
the discrete relaxation scheme involves the solution of an NP-hard optimization problem. It is also 
important to note that, as shown in Figure [8j the CR-QPSK scheme is always inferior to the lat- 
tice relaxation scheme in terms of the limiting energy penalty. Hence, in view of the observations 
made here, one can conclude that restricting the analysis to the energy penalty alone provides 
only limited insight into the behavior of large coded systems, as it essentially focuses only on the 



11 Note that in general the modulo-receiver employed by GTHP induces poor performance in the low spectral 
efficiency region (see Appendix |f| . 
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transmitter, while ignoring the impact of the nonlinear precoding scheme on the receiver. 



9 Concluding Remarks 

The replica symmetry breaking ansatz of statistical physics was employed in this paper to inves- 
tigate the large system limit behavior of nonlinear precoding for the MIMO Gaussian broadcast 
channel based on linear zero-forcing and alphabet relaxation. For lattice relaxations, the replica 
symmetric ansatz was shown to yield misleading results for system loads greater than approxi- 
mately 0.3 while the one-step replica symmetry breaking ansatz provides sensible results for any 
load. For exact results, however, multiple-step replica symmetry breaking must be considered. 

Introducing a nonlinear superchannel comprising the actual channel and the precoder, allows 
for a Markov chain description of an individual user's channel. This enables the calculation 
of mutual information and spectral efficiency in the large system limit. While convex QPSK 
relaxations are significantly outperformed by lattice relaxations in terms of transmitted energy 
per bit, they are very competitive when combined with strong error-correction coding as shown 
by the spectral efficiency analysis. Except for medium signal-to-noise ratios, they are superior 
to lattice relaxations. Both schemes were shown to outperform Tomlinson-Harashima precoding 
with QPSK input for all signal-to-noise ratios. 

The combination of polynomial complexity and high spectral efficiency makes convex alphabet 



relaxation schemes, as introduced in 24 , a promising alternative to the NP-hard lattice relax- 
ations due to their polynomial complexity, and to Tomlinson-Harashima precoding due to their 
superior performance. The results motivate the search for convex schemes amenable to efficient 



implementation. Additional examples for extended alphabets are currently investigated, see 25 
for preliminary results. Note, however, that the problem of finding the optimum precoding scheme 
that maximizes the spectral efficiency in this framework is not at all trivial, as the correspond- 
ing equivalent channel statistics depend, in this setting, on the choice of input distribution and 
extended alphabet sets. 



A Higher RSB Orders 



The r-step RSB ansatz reads 



i=l 



using the constants I^Pr 1 "*, ■ ■ • ,Pr 



(r) (i) 

XrifJ-r j 



,(•) 



■ 



, Xr r 



(A-1) 



. , /ir^ | ■ The limit as r — > oo is called full RSB 



and gives the exact solution to the problem 37 . The particular temperature-dependent scaling of 



some parts of Q is used to evaluate the free energy at zero temperature without getting divergent 



terms. If a finite temperature is of interest a different scaling may be considered. Plugging (A-1 1 



into (3-251, while exploiting the particular structure of Q, we find: 
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Proposition A.l For any temperature, the energy for r- step RSB is 

m = 



Xr + X! iti'V 



)JS) 



R' -Xr-E^W 



Xr T Zjj=l M 1 * 



1=1 



j=2 VPr pr 



+ £/ 



(i) 



Xr Xr 



R(-Xr) , 



(A-2) 



where R'(-) denotes the derivative of the function R(-). 

In order to proceed to full RSB, the limit r — > oo must be taken. Naively, one might think 



this would make the sums in (A-2 1 diverge. However, the macroscopic parameters are determined 



by the saddle point equations, which guarantee that the sums stay finite through decreasing the 
macroscopic parameters. Thus, we introduce a continuum of macroscopic parameters fJ,(x) and 
p(x), taken over < x < 1, such that 



q = q r , 

X = Xr , 

p(i/r) = pW 
MM = 

P(0) = 9r , 

m(o) = 1 , 



and the function 

l 

X 

Accordingly, we find for the energy in the limit r — > oo 

Sf(0)' 



(A-3) 
(A-4) 
(A-5) 
(A-6) 
(A-7) 
(A-8) 



(A-9) 



g(fi) = gSf(0)-R'[Sf(0)] 



M (0) 



fl[fr(0)] 

1 Sf (x) fl[Sf (i)] d 4r\ + ( + | ) J2[^(l)] 



M 2 (*) VM1) /3 



(A-10) 



Using integration by parts, (A-10) simplifies to 



<?(/?) = [Sf (*) i?[^(x)]]'L _ n + § R[9(l)] + 



d[&{x)R[&(x)]] 



(A-ll) 
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The functions p(x) and /x(ir) must be determined by the respective saddle point equations. 



B Proofs for 1RSB 



B.l Proposition 4.1 



The joint distribution of the entries of the vector x, conditioned on both the input vector u and 
the channel transfer matrix H, is given for a non-zero temperature by the Boltzmann distribution 



P B (x\H,u) 



1 



(B-l) 



where Z is the partition function defined in (3-4). Taking the limit j3 — > oo (zero temperature), the 



denominator in (B-l ) is dominated by its maximum value term, and the limiting joint distribution 
of the entries of x, conditioned on all inputs, converges to the Dirac measure at argmin.,. e ^ x^ Jx, 



corresponding to the minimum normalized energy penalty, as given by Proposition |4.1| 

To prove Proposition [4Tj we will need to evaluate the free energy averaged over all realizations 
of u and H. For future convenience, we also include the dummy variable h and the function V(-) 



defined in (3-14) and rewrite the free energy as 



1 



lim — E u H {log Z(h;u,H)} 

K—>oo K 



lim ]T P v k( 



u) I —E H {logZ(h;u,H)} 



(B-2) 



where Z(h\u,H) is given by (3-15). The second equality is a manifestation of the underlying 



assumption that the coded symbols of all users are drawn randomly and independently of the 
channel transfer matrix H . In view of this formulation, we consider now the limit of the term in 
the parentheses above 

(B-3) 



lim -E H {\ogZ(h;u,H)} ■ 

K—yoo K 

As shown later on, this inner limit is a deterministic quantity, for almost every realization of the 
input vector u. It will hence be concluded that in fact 



P&(0) = lim ~E H {log Z(h;u,H)} 



(B-4) 



As indicated earlier, the key tool in the derivation of the above quantity is the replica method of 
statistical physics, using the identity 



E H {log Z(h; u, H)} = lim - log E H {[Z(h; u, H)] n } 

n->0 n 



(B-5) 



and following the outline in Section[3j With that in mind, the quantity [Z(h; u, H)] n is regarded as 
consisting of n identical replicas of the original (unnormalized) probability model in the following 
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way 28 



[Z(h;u,H)] n = ( £ 



-0V(h4,v,<E,u)) e -f3x ] Jx 



(B-6) 



Interchanging the limits of K — > oo and n — > 0, the focus is first on the derivation of the limit 



E n ± hm -\ogE H {[Z{h-u,H)] n } 

if— >oo K 



hm -JogE H \ Y e -^= 1 v{h£,v,* a ,u) e -Tr{ l }JT, n a= ,* a *l) 



(B-7) 



{**} 



Since the first exponential term within the expectation is independent of the channel transfer 
matrix H , S„ can be rewritten as 



= lim 



l - log ( Y, e ~ P ^ V ^' V ' X ^E H {c 



1)} 



(B- 



The inner expectation in (B-8) is the Harish-Chandra-Itzykson-Zuber integral (see [24] and refer- 
ences therein), and the objective here is its evaluation for fixed-rank matrices ^™ , x a x' a , in the 



large K limit. This problem was recently considered in 53 , and invoking Theorem 1.7 therein, 
( B-8 ) can be represented for large K asp^ 



lim — loe 

K->-oo K 



E' 



-P 122=1 V{h,£,v,x a ,u) -KJ2" =1 Jo a dw+o(K) 



(B-9) 



where R(w) is the i?-transform of the limiting eigenvalue distribution of the matrix J, and {A a } 
denote the eigenvalues of the n x n matrix PQ with Q defined througlrj 



_ 1 t A 
■lb — K X a X b — K 



1 K 

j? E X ak x bk 



(B-10) 



k=l 



Since additive exponential terms of order o{K) have no effect on the results in the limiting regime 
as K — > oo, due to the factor outside the logarithm in ( B-9[ ) (this shall become clear in the 
derivation to follow), any such terms are dropped henceforth for notational simplicity. 



In order to calculate the summation in (B-9 1, the procedure employed in 24 is repeated here 



and the ii'n-dimensional space spanned by the replicas is split into subshells by means of (3-24) 
Assuming = Q, 3 n can be represented as 



lim -^log 



e Kh e KX(Q) e -Kg(Q )q)Q 



(B-ll) 



12 o(K) is used here to denote quantities that satisfy limpc^oo o(K)/K = 0. 
13 Here [53 Theorem 1.7] is applied individually for all given vectors {aj a }. 
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where 

n n 

VQ = ]]dQ aa J] d3?(Q ab )d3(Q ab ) (B-12) 

a— 1 b—a+1 

is the integration measure, 

iSA„(Q) 



0(Q) = R(-w)dw (B-13) 

0=1 £ 

= 53/ K(Q)R(~w\ a (Q))dw (B-14) 

= J Tr [QJJ(-ioQ)] du; , (B-15) 



since the trace is the sum of the eigenvalues, 



8 

L = ~^^2v(h,^v,x a ,u) , (B-16) 



and 



e ki(q) = J2U 8(xlx a - KQ aa ) TJ 8{^[x\x b - KQ ab }) 5(Qt[xlx b - KQ ab }) (B-17) 

{x a } a =l b=a+l 

is the probability weight of the subshell. 

Starting with q kx (Q)q^ we follow E4] and represent the Dirac measures using the inverse 
Laplace transform. This is performed by introducing the complex variables 1 < ci < b < n, 

and |Q ^ j, 1 < a < b < n, and defining the matrix Q with elements (taking a < b) 

Qaa = Q<£ , (B-18) 

Qab = KQS-jQW) , (B-19) 
Qba = l(Q%+jQT) ■ (B-20) 



Denoting by P the Hermitian matrix with elements P a b — &a x b ~ KQab, this yields 

} etc 
2ttj 



S(P aa ) = / e^"^- , (B-21) 



5(K{P o6 })5(9{P oh }) = J^e&lP^-Q^n^} dQ ^ d ^ b (B-22) 
where the integration is over J = (t — joo,t + joo), for some t € K (note that P ab = P ba ). 
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Substituting in (B-17) and combining with (B-16), it follows that 



e KX(Q) e KL 



E 

{*a} 



.v,x a .u) ~ ~ 

e"' 6 e »=i DQ 



jn 



v -KTr(QQ) | e 



(B-24) 



DQ 



where 



n { rlf) {1) n ^h {1) Jf) {Qy 



11 \ 2irj 11 (2tt7') 

0=1 \ J b=a+l y J ' 



;Y2 



(B-25) 



Considering the inner summation in ( B-24 1 , then rearranging terms and using ( 3-14 ) the expression 
can be rewritten as 



Defining 



one finally gets 



K 



n E « 

fc=i{x a e.« Ufe } 



E««^^« +>»/3 E i{(«a,«j=)=(e,«)} 



M k (Q) = Y, e ' 



e KI(Q) e K~L = 



-KTt(QQ)+J2 logM k (Q) _ 

e *=i DQ 



(B-26) 
(B-27) 

(B-28) 



Now, using the underlying assumption that the coded symbols transmitted by different users are 
i.i.d., one can apply the strong law of large numbers for K —¥ oo to get 



1 K 

log M(Q) 4 _^ bg M fc (Q) 



(B-29) 

„ , E VaQ* b ] E !{(*<»,«)=(£,«)} 
log ^ eW / »=i dFj/Cu) (B-30) 

= / log ^ c -=i dF^(u) , (B-31) 

where the convergence is in the almost sure sense, for any extended alphabets such that the 



fe=i 



expectation in (B-30 1 exists. Note that this observation implies that any randomness due to u in 



the RHS of (B-ll) effectively vanishes at the large system limit, due to the normalization with 



respect to K outside the logarithm. 



The next step in the evaluation of (B-ll ) is the observation that in the limit as K — > oo, the 



integrand therein is dominated by the exponential term with maximal exponent. Therefore, only 
the subshell that corresponds to this extremal value of the correlation between the vectors {x a } 
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is relevant for the calculation of the integral. Thus, we have at the saddle point 

— [g(Q) + Tr(QQ)] = . 



(B-32) 



Since the trace is the sum of the eigenvalues, we can write (B-13) as 

0Q 



G(Q) = Tr J R(-w) di 



(B-33) 



and ( B-32 1 gives 



Q = -fiR{-fiQ) 



(B-34) 



Furthermore, we observe that also the integrand in (B-281 is dominated by the exponential term 



with maximal exponent in the limit K — > oo. Thus, at the saddle point we have 



With (B-31), this gives 



d_ 
dQ 



log M (Q) - Tr(QQ) 



(B-35) 



Q 



^2 xx Te 



tQx+ft/3 J2 l{(x a ,u) = ($,v)} 



dFu(u) 



(B-36) 



E e 



We now invoke the 1RSB assumption (3-31) regarding the structure of the matrices Q at the 



saddle-point that dominate the integral. In a similar manner we set 



Q = p 2 tfl nxn +^g(I 



2„2 ; 



1 Ma - 



PS\1 nxn 



(B-37) 



introducing the macroscopic parameters /i, gi, and E\. 

With these assumptions one can explicitly obtain the eigenvalues of the matrix and 
G{Q) can be rewritten as 



nf3 
Hi 



R(—w) dw 



.Ml 



Xl+HlPl+Pnqi 



R(—w) dw 



J R(-w) dw . (B-38) 



14 The eigenvalue (/3n<ji +M1P + X1) occurs with multiplicity 1, the eigenvalue (fiipi +Xi) occurs with multiplicity 
( (jf — 1), and the eigenvalue \l occurs with multiplicity (ra — 



38 



It also follows from the 1RSB assumption that 



Tr(QQ) 



pft PA -fa 



2 nii-i 

n — §^ // 



71 Hi nfii 





n 



n 





n n 





9i 




Pi 




Xl 




L J 



(B-39) 



and 



logM(/i,ffi,£i,^i) 



log c 



ft\ E xa\ +P 2 gi E 

\a=l 1 = 



E <n ,hn 
1=1 a +T 



-0S1 E \x a \ 2 +hP £ l{(x a ,u)=(Z,v)} 



dFu(u) . 
(B-40) 



Due to ( B-32 ) , the partial derivatives of 

G(quPuXl,»l) + Tr(QQ) (B-41) 
with respect to q\ 1 pi, and xi must vanish as K — > oo by definition of the saddle point. Using 



( B-38 ) and ( B-39 1 this yields the following set of equations 



= 
= 

= 



n 2 p 2 fl + nPfiigf - nfa + n0R(-xi ~ MiPi ~ Pnqx) , 
nPmfl + nPuxgj - nfiei + {n/3 - /zi).fl(-xi - MiPi) 
+Ml- a ( _ Xi ~ MiPi - nfiqi) , 

npft + nf3gi - ne 1 + ( n - ^ ) iZ(-Xi) + ( — - 1 J R(~Xi ~ MiPi) 



Mi 



Mi 



+i?(-Xi - /iipi - n/tyi) . 
Solving for e\, g±, and /i, while focusing on the limit as n — > 0, one gets 

£i = R(-xi) , 

9i = 



iR(-xi) - R(-Xi - MiPi) 



Mi 



(B-42) 
(B-43) 

(B-44) 

(B-45) 
(B-46) 



fi 



'R{-Xi ~ MiPi) - R(~Xi - MiPi - 



n/3 



V9i-R'(-Xi-MiPi) ■ (B-47) 



We now rewrite the expression for Mk(fi, gi, s%, fj,i) in (B-40) using the Hubbard-Stratonovich 
transform and the shortened notation of (|4-6|) 



,M 2 



e 2R{».-} D 



(B-48) 
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yielding (c.f. [241 (66)-(70)]) 



l0gM(/i, 3 i,£i,^i) 



§_ 



log i <• 



E [2/3/i5R{ a ; aZ *}- ) 3ei|x a | 2 +^l{( 2;a ,u) = (4,t ) )}]+/J 2 g J J2 



E » , Jmi 



/ log / / ( H fC(u,x,y,z) 



Dy 



CI 



Dzdfb(u) , 



with 



JC( U ,X,y, z) = Q^{<h^+9iV*)}-l3^i\'A 2 + hl31{(x,u) = ((,.v)} 



Due to ( B-35 ) , the partial derivatives of 



lo g M(/i, 5 i,ei,Mi)-It(00) 



Dzdi^(u) 
(B-49) 

(B-50) 
(B-51) 
(B-52) 



with respect to fx, <?i and S\, must also vanish as K — > oo. This produces the following set of 
equations (while taking the limit as n — > 0) 



Xi + Pi£«i 



j J2 K.{u,x,y,z) Dy 



3? {xz*} /C(w, x, y, z) Dy Dz di 7 iy(u) , 



(B-53) 



Ml -I 

3 1 



Xi + (9i +Pi)Mi = 



.9i 



£ 1C{u,x,y,z) 



/ E lC(u,x,y,z) Dy 



^ 3fJ{xy*}/C(M,a;,y,z)DyDzdF ;7 (u) , (B-54) 
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<7i + Pi 



E fC(u,x,y,z) 



/3 



\x\ 2 K,(u,x,y, z)~Dy~Dz dFjj(u) 



I E K{u,x,y,z) By 



The parameter /ui should also be chosen such that the partial derivative of 



Xi 





(B-55) 



5(gi,Pi,Xi,Mi) +Tr(QQ) - logM(/i,0i,£i,/ii) 



(B-56) 



with respect to /i X vanishes. This yields at the limit as n — > 



= - 



Xl+A»lPl 

~2 / + qigl + 



s 



E ^(u,x,y,z) 



/^l/ E £{u,x,y,z) By 



log fc^^'y^) D v 



~2 l0g ( / ( X! ^("'^'J/' 2 ) ) D ^ 



Dzdi^H . (B-57) 



Incorporating all previous results, we get that the quantity ^ n of (B-7) is equal to 



~ n =l(Q)+L-G(Q) 

= logAf(/i, 5 i,£i,^i) - P 2 flqin 2 



/?/i(Pii«i +Xi) + P9xi0.iV>! +Xi) ~ /?£i(<?i +Pi + -5-) n 



/3 



Xl Xl+MlPl 

n- jpj J R(-w)dw- - l^j J R{-w)dw 




Xl+VlPl+Pnqi 



dw , 



(B-58) 



where the macroscopic parameters {/1, g\, £1, qi,Pi, Xi, (Xi} are obtained from the saddle point 



fixed-point equations (B-45)-(B-47l, (B-53)-(B-55l, and (B-57 1. Now in view of (B-5), the next 
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step in the derivation is to take the limit 



Urn ~3n = — [ / log ( / ( J2 K(u,x,y,z)\ By) DzdFu(u) 

- PfUxi + P1M1) ~ /%iOd + (Pi + + +Pi + 



X1+M1P1 



U - £j y* R(~w) dw - £ y iJ(-ttf) dw - pqxR{-Xi ~ M1P1) 



(B-59) 



But in fact 



Hm-S n = -/^(/3,/i) , 



(B-60) 



which is justihed by the observation that S„ converges to the same limit for almost every realization 



of u, applying the law of large numbers in (B-30) (see also (B-3) and the discussion that follows). 
We note at this point that the energy penalty <? rs bi satisfies 



4sbi = Jim ^GM)lfc=o , 



(B-61) 



and (4-12 ) can be readily expressed from Proposition A.l as a function of the macroscopic parame 



ters {qi,pi, xi, Mi}- In order to evaluate the energy penalty, it is thus left to derive the fixed point 



equations that determine these parameters, as given by (4-8)-(4-ll), which is obtained by substi 



tuting h = and taking the limit as /3 — > 00 in (B-53)-(B-55), and (B-57) after back-substitution 



of (B-51 1. This completes the proof of Proposition 4.1 



B.2 Proposition 4.2 



We derive the limiting conditional distribution of the precoder output x given the input u starting 



from (3-171. We therefore need to evaluate the derivative of the free energy with respect to h. 



This can be done directly given (B-60). Taking the partial derivative in (B-59) and using (B-51), 
we get 



Pu(v) ah 



h=0 

£1 1 

^2/33t{x(f 1 z'+g 1 y')}-0e 1 \x\ 2 \ " 



in 



x£S3 u 



,2mMflz*+9iy*)}-Pei\x\ 2 



Dj/Dz 



l{(x,u) = (£,v)} 



Pu(v) 



dF v (u) 



^2x£ 



— — 1 

,20K{x(fiz' +giy*)}-Pei\x\ 2 \ 13 



(B-62) 



,2/3K{?(/i Z *+ 9 iy*)}-/3ei|C| 2 



(B-63) 



DyDz 



(B-64) 
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After taking the limit (3 — » oo, while applying the saddle point integration rule, we finally get 
(4-13). This completes the proof of Proposition 4.2 



B.3 Proposition |4.5 



We start from ( pHo) ), ( pTl9| ), and p^20] > which yield 



d/3 



(B-65) 



The partial derivative with respect to (3 above reflects the fact that all implicit dependencies 
of on j3 through its dependence on other parameters, e.g., /i, <7i,Xi>Mi> have vanishing 

derivatives since is evaluated at a saddle point. Making use of the saddle point equations, 

we find that 



P wt, = Xi 1 R{-Xi) +P[—+Pi + 

OP V Mi/ \Mi 



9i J #(~Xi - MiPi) ~ P<liR'(-Xi - MiPi). 

(B-66) 

Next, we need to analyze the behavior of j3&(/3) for large /3. This can be seen directly through 



(B-59), (B-601. The first term in (B-59) can be shown to be of the form (3 A + 0(f3~ 1 ), where 
A <G K is some constant. The reason for this behavior stems from the fact that for a discrete 
alphabet the corrections to the leading order term are exponential in f3, except for a small 0{f3~ l ) 
region close to the nearest neighbor points in the lattice. Therefore, to order the first term 



in (B-59 1 is simply (3 times its partial derivative with respect to (3. This enables us to evaluate 
the value of f3cP(J5) for large (3 to the order /3 _1 as follows: 



d_ 
d/3 




9 



,2/3K{x(/iz* +g 1 y')}-/3e 1 \x\ 2 



By DzdFu(u) 



PftiXi + P1M1) ~ Pg'tiXi + (Pi + <?i)Mi) + Peifai + Pi + ^j) 



Ml 



Xl /-Xl+MlPl 

R{—w) dw / 

Mi Jo 



R(-w)dw-/3 qi R(- X i-»iPi) ■ (B-67) 



Using the fixed point equations we may re-express the first line as follows: 



Mi 



Xl 



R(-w)dw + /3qiR(-Xi - MiPi) + 2/3gii?'(-xi - MiPi) 
2faifl(-;a) 



xi-R(-xi) 



Mi 



- 2/3i?(- X i ~ MiPi) 



Xl 
Mi 



9i - Pi 



(B-68) 
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Plugging this into the above equation and using (B-45 |-(B-47), we eventually get the following 
equation for the zero-temperature entropy 



& = xiR(-xi) 



Xi 



R(—w)dw 



(B-69) 



Remarkably the above equation for the entropy holds also for the RS case. To recover the RS 
structure of the equations above we start with /Zi//3 = 1 and xo = Xi + MiPi- Then, we find that 
qo = qi, eq = e\ — /Sgf, and fo — fi- After that we find the equations to reduce to the RS case 
analyzed in [24]. 



C Proof of Proposition 3.1 



We will now apply ( 3-9 ) to express the energy in a compact fashion. We start by considering 



the representation of the normalized average free energy in terms of Q (see (3-19) and (3-24)), 
and let us denote this representation, for the sake of clarity, as jF(Q,/3). In general, the replica 
crosscorrelation matrix Q depends on j3. However, at the saddle point we have (by definition) 



dQ 



= 0. 



(C-l) 



Thus, the total derivative in (3-9) becomes a partial derivative at the saddle point, i.e. 



^) = -(^(Q,«) 



(C-2) 



Referring to the proof in Appendix |B] then with |B-2j), (B-5J), ( |B-7[ ), ( |B-ll[ ), and (|B-15j), while 
substituting h = 0, this gives 



£{p) = to l ~ / Tr[QR(-wQ)]dw 

n->0 n op J 




(C-3) 



which is easily shown to be equivalent to (3-25). Furthermore, we get (3-26) by plugging (B-34) 



into (B-36) while substituting h = 0. 



D Discrete Lattice Relaxation: Small xi Approximation 
Near Unit Load (1RSB) 

This appendix provides an approximate derivation of the 1RSB equations for the discrete lattice- 
based alphabet relaxation scheme of Section[6j while assuming a Gaussian H, and a ZF front-end. 
The approximation is based on the numerical observation that the macroscopic parameter \i, 
employed in the 1RSB ansatz for this setting, approaches zero as the system load gets close to 
unity. This approximation considerably simplifies the numerical solution of the 1RSB equations 
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in this region of the system load. 



D.l Case I: a = 1 



For a = 1, the i?-transform of J — T^T (see Proposition 4.1 ) satisfies 
R(-w) = 

R'(-w) = 



a - 1 + - a) 2 + iaw 
(l - a - - a) 2 + 4av?j 



1 

oc=l i/w ' 
2 



Aaw 2 ^/ (1 — a) 2 + 4aw a=1 2wz 



(D-l) 
(D-2) 



Considering the small xi regime, we get from ( 4-2 H 4-4) 



,9i 



1 

/xT 



/xT Vxi+piPi 
Mi 



xi^ 1 V MiVXi 



Xl<l 



y/qiR'(-HiPi) = . qi 1 

V 2 (MiPi) 



(D-3) 
(D-4) 
(D-5) 



Particularizing to the two-dimensional discrete lattice-based alphabet relaxation scheme in con- 



cern, one gets from ( 6-6 ) 



MO 



ZlVk - M 



91 xi<i % /Xi 



r ^WTx = ^\ C±± ^= 1 V|e|<oo. (D-6) 



Xi 



We now rewrite the function Sfc(£) of (6-7) as 

6 fc (0 A e ^[M-«)<4+»«] [Q (^2(^(0 - /ii.9ic fc )) - Q (V2(^ fc+1 (0 - MiSiCfc)) 



and observe the following. Starting with exponential argument, we get 

Cfc 



Xl<l 



while the arguments of the Q(-) functions satisfiy 
i>k(0 ~ [i-\9\Ck 



+ 2M 



(D-7) 



(D- 



X1<1 v 4 

Al 



Ml Cfc + Cfe_l ^//il 

—Cfe 



Mi c fc - c fc _i 



X? 



and 



, , ts VMl Cfe + i + C fc ^//ii y'/ii Cfc + i - Cfc 

%+i(0 - MiSiCfe ~ 1 — i ^ ^Cfc = — E 

Xi Xi Xi 



(D-9) 



(D-10) 
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Now recall that from the underlying definition of the extended alphabet set, it follows that c& > 
Cfc_i Vfc, and it can hence be concluded that 



xi<i I -.vw^'- 1 



Cfc-i = -co, \Ck\ < oo 
-> -oo |c fc _i| , \c k \ < oo , 



(D-ll) 



and 



f^^oo \c k \, \c k+1 \ <oo 

■ OO Cfe < OO, |Cfc+l| = oo . 



This implies that 



Xi->0 



Xi-i-0 



> 1 Vfc 



> Vfc . 



(D-12) 



(D-13) 
(D-14) 



We therefore conclude that 



e fc (0 « e 



3 MlCfc [(^ig?-ei)c fc +2/i{] 



Xl<l 



(D-15) 



In a similar manner one can observe that the exponential terms in the RHS of (6-8) vanish as 
Xi — > 0, and conclude that 

(D-16) 



^(0^0 Vfc 



In view of the above we can now restate the coupled equations that determine the macroscopic 



parameters q%, pi, and /xi in the following way (cf. (6-9 1-( 6-12 ), and note that the equation for 
determining \i can be ignored): 



91 



Pi 



Xi<l 



1«1 /; 



X1<1 



ELi^e m (Q c ., 2 d^ 
2_ f E L m =ic m e m (Q e dg 



(D-17) 
(D-18) 
(D-19) 



where we used (D-l|-(D-5) to obtain 



du; = , R'(-fixpi) = — n 



(D-20) 



The energy penalty in this case is given by (cf. ( |4-12 )) 

gi + Pi _ giMiPi _ gi + 2pi 
xi<i VmiPi 2(/iipi)i V^iPi 



bl 



(D-21) 
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D.2 Case II: a < 1, a ->• 1 

In a similar manner to the previous section, we start with the i?-transform of J = T^T, and 
rewrite it for small w, using the Taylor expansion around w = 0, as 



R(-w) 



1 + ^(1 - a) 2 + Aaw -(1 - a) + (1 - a) ( l + j&pw - j^w 2 + o( W 2 )) 

u>«l' 



2aw 



2aw 



1 



+ 0(u; 2 ) 



w<i 1 — a (1 — a) 3 

(D-22) 

We focus in the following on the regime in which a — > 1, so that j-z^ ^ 1: but still <C It 
hence follows that 



xi<i 1 — a 



l 



.91 



l-a JftlPl 



/'l 



xi<si:Q^i V — a) 



.A ~ , y/qiR'{-^\P\) ■ 

Xl<Kl 



(D-23) 

(D-24) 
(D-25) 



Particularizing again to the two-dimensional discrete lattice alphabet relaxation scheme for QPSK 



signaling, it follows from (6-6) that 

e\v k - fxi 



MO = 



9i x 



i«l,a->l l-a V 1 



Mi c fc + c fc-i 



v|£|< 



Considering (6-7) we write 



(D-26) 



(MiSi -£i)c k + 2te « -i?(-MiPi)c fe + 2/^ 

Xl<l 



Next, the arguments of the Q(-) functions in (6-7) satisfy 



Xl<l,a->-l V 1 — a 



Ml Cfe + Cfc_i 



Mi 

1 - a 



Ck 



and 



Ml c k — Ck-l 

l-a 2 



(D-27) 



(D-28) 



V>fe+l(£) ~ Mi5iCfc 



Ml c fc+l + Cfc /Mi 



xi<i,q^i \l 1 — a 2 
This enables us to conclude that 



1 — a 



Ck 



Mi c fc+i ~ c k 
l-a 2 



(D-29) 



- MiffiCfc ► -oo 

V>fc+i(?) - Miffi c fe > o° , 



(D-30) 
(D-31) 
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and hence 



Q k (f) fa e /*icj,[(/*i5i— ei)c h +2/if] py e Mic fc [2/i5-fl(-^ipi)c fc ] (D-32) 



and 



Xl<< °^S , Vfc . (D-33) 



Finally, note that Jq 1Pi R(—w) dw exists for a < 1, and the approximation 

MiXiffi > (D-34) 

is employed to derive the three coupled equation that determine the macroscopic parameters q\, 
Pi, and fix. The three equations are thus 

' 2/ eLi <=>»>«) 7f" B ' ( > 



Xi<l,a->1 

2 J log (ELi e m (0) e"« 2 ^ - J MIP1 fl(-u>) dtfl + ^fa + 2p 1 )i?(-MiPi) 
1 Xl <i,a-n 2q 1 fx 1 p 1 R'(-nip 1 ) 

(D-37) 

The expression for the energy penalty is given by 

4sbi ~ (qi +Pi)R(-fJ>iPi) ~ QiHiPiR'(-^iPi) ■ (D-38) 

Xi<Cl,a->l 

We also note that the exact expressions for the i?-transform and its derivative were employed for 
the purpose of producing more accurate numerical results, while using this small \i approximation 
for a < 1. 



E Proof of Lemma 4.6 



The Stieltjes transform of the probability distribution F(x) is defined by 



m( s ) = I . (E-l) 



In terms of the Stieltjes transform, the i?-transform is defined as 

R(w) =m~ 1 (-w)- - , (E-2) 
w 

where m~ 1 (s) denotes the inverse function of m(s) with respect to composition, i.e., m(m^ 1 (s)) = 
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We start with the observation that the derivative of the Stieltjes transform is lower bounded 
by its square 

by means of Jensen's inequality, with equality if and only if the distribution F{x) is a single mass 
point. Next, we consider the derivative of the i?-transform. Letting w = m{s), it follows that 

R'W = + ~ 2 (E-4) 

dw w z 

~ ] ' >0 , (E-5) 



m'(s) [m(s)p 



with equality if and only if the distribution F(x) is a single mass point. Lemma 4.6 then follows 
immediately. 



F Spectral Efficiency of Generalized Tomlinson-Harashima 
Precoding 

For the sake of comparison, we review here the derivation of the spectral efficiency of generalized 
Tomlinson-Harashima precoding (GTHP), which is another practical alternative to the capacity 
achieving DPC. The approach is based on inflated lattice strategies, and borrows ideas from the 



recent analysis of pulse amplitude modulation (PAM) in 54 . The spectral efficiency is derived 



following |6l^5J (see also 20 56 ) , while employing successive encoding using the inflated lattice 



strategy at each stage, where the signals of previously encoded users are treated as causally 



known interference. We consider here the "canonical" channel model as in (2-1), and note that a 



comparative analysis of other variants of GTHP can be found, e.g., in 20 



The underlying idea of the scheme considered here is first to induce a "triangular" channel 
structure using the LQ-factorization of the channel transfer matrix. Assuming H is full rank, we 
denote 

H = LQ , (F-l) 

where L[ KxK ^ is lower triangular with positive diagonal entries and Q KxN has orthonormal rows. 
The transmitted signal is then given by 



Q f a; 



(F-2) 



where x is the nonlinear precoder's output (cf. (2-3)). The signal received by the kth user is thus 
given by 



k-1 



rk = L kk x k + LkjXj + n k , 



(F-3) 



where {Lij} denote the entries of L and Xi is the nonlinear precoder's output that corresponds to 
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user i. Normalizing both sides of the equation by Lkk, we get the following equivalent channel 



fe-i j 

rk = x k + V* -f-^-Xj + h k 
*H Lkk 

3=1 

= Xk + s k + h k , 



(F-4) 



where we denote the multiuser interference experienced by user k as Sk — 52j=i rft x .J> an< ^ ™ k * s 

2 

a zero-mean circularly symmetric complex Gaussian noise with variance 

L kk 

In the GTHP setting, instead of DPC (as employed at this point, e.g., by the "zero-forcing 



dirty-paper" scheme of [49]), we follow for each user the THP-type strategy described in 55 
for canceling the interference due to previously encoded users. This strategy, which applies for 
canceling causally known interference, leads in the broadcast setting to a considerably reduced 
complexity as it involves only scalar quantizations (as opposed to vector quantizations in the 
noncausal case, see therein). To make a fair comparison to the precoding schemes discussed in 
Sections [6][7j we particularize here to the case in which the information bearing signal takes on 
binary values per each dimension (so that the total spectral efficiency for quadrature modulation, 
as a function of j^, is twice as much as the one obtained for binary input). The spectral efficiency 
for continuous input is derived as well for completeness. The basic transmission scheme is reviewed 
first, while considering real channels. 

The underlying real channel model is given by 

y = x + s + n , (F-5) 

where x is subject to an average power constraint P x , n is a zero-mean AWGN with variance P n , 
and s is an interference signal which is known causally at the transmitter (i.e., at the current time 
instance), but not at the receiver. This channel model is also referred to in the literature as the 
"dirty-tape" model [55] . Consider the one-dimensional lattice 

A = A{-..-3,-l,l,3,---} . (F-6) 

Let V = [—A, A) denote the basic Voronoi region of A. Let d be a dither signal uniformly 
distributed over V. Under a common randomness assumption, this dither signal is assumed to be 
available at the receiver as well. 

Starting with continuous information bearing signals, then by the GTHP scheme the trans- 
mitter sends the signal 

x = [v — as — d] mod A , (F-7) 

where a £ (0, 1] is referred to as the "inflation factor" . The receiver scales the received signal by 
a, adds the dither signal, and then performs a modulo-lattice operation, yielding 

y' = [&y + d] mod A . (F-8) 
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Effectively, the induced channel is equivalent to (see 55 , Lemma 6) 



y' = [v + n e ff] mod A , 



(F-9) 



where the effective noise is given by 



n e ff = [(a — l)x + an] mod A , 



(F-10) 



and we note that the dither signal ensures that x is uniformly distributed over the Voronoi region, 
and is independent of either the information bearing signal v, or the noise n. 

The capacity of this channel is achieved by a uniform input distribution over the Voronoi region, 
v ~ Unif {V}, for which the relation between the lattice constant A and the transmit power P x 
is given by A = y/3P x . The corresponding achievable rate is equal to the input-output mutual 



information of the equivalent channel (F-9) 



R{P X ) 4 y') = log(2A) - h(n cS ) 



= - log(12P B ) -J /„„„ (C) log 2 /„ cff (C) d( 



(F-ll) 



The entropy of the effective noise is derived via the following observation. Denoting the "self-noise" 
term by 

Z=(a-l)x , (F-12) 



its pdf is given by 



MO 



HI^JA |C|<(1-5)A, 



otherwise . 



(F-13) 



The pdf of the effective noise (F-10) is thus given by 



fnAO = 



E£-oo/z(C-2<A) -A<C<A, 
otherwise , 



(F-14) 



where fg{C) denotes the pdf of the pre-modulo noise term, which is given by the convolution of 



the pdf of the self-noise (F-13) and the pdf of the scaled AWGN 



/z(0 = MO*/a»(0 



i 



2(1 -a)A 



Q(^(C-(l-a)A)) -Q(^(C + (l-a)A) 



(F-15) 



We normalized here without loss of generality the spectral level of the AWGN to \ per dimension 
(inducing a unit noise spectral level in complex channels )50|), so that effectively P x specifies the 



SNR of the original underlying complex channel model, corresponding to (F-4). The rate in (F-ll ) 
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can be optimized with respect to the inflation factor a (which is performed to obtain the numerical 
results shown in Section [8]). We also note that choosing a = 1 corresponds to standard THP, while 
another popular choice is the minimum mean-squared error (MMSE) factor (also referred to as 
the "Costa factor" (l) 

p x 

«MMSE = 5 2p ■ (F-16) 
* x i *~ n 

Turning to discrete input with M-pulse amplitude modulation (M-PAM) (representing the in- 
formation bearing signals) , the setting is equivalent to the case in which the continuous information 



bearing signal considered above is quantized (cf. [57] ). Instead of (F-71, the channel input is now 
given by 

x = [Q(v) - as- d] mod A=[v Q - as - d] mod A , (F-17) 



where Q(-) denotes the nearest-neighbor uniform quantizer with step size A 54 , and v is assumed 
to be uniformly distributed over the Voronoi region. We note here that this transmission scheme 
differs from the one considered in [54| , where the channel input is quantized to comply with an 
M-PAM constellation (see therein). Note also that as in the continuous setting, due to the dither 
signal, the channel input x is still uniformly distributed over the Voronoi region. The effective 
channel can now be represented in the form (cf. ( |F-9 )) 



Vq = [«Q + n M mod A , (F-!8) 



where the effective noise is still given by (F-10). Restricting this review to the case of binary 



information bearing signals per dimension, the channel input signal is limited to the interval 
V = [—A, A), while the quantized information bearing signal is obtained using 

Q{v) = { 2 ~ (F-19) 

l+f 0<v<A. 



For consistency we retain the relation A = ^/3P Xa . 

The achievable rate for binary input is given again by the mutual information 

R(P Xa )±l(v;y' Q )=Hy , Q )-h(n eS ) . (P-20) 



Note that the pdf of the random quantity inside the modulo function in ( F-18 1 is given by 



MQ = fv a (Q*f2(Q 

= (M^) + M<4jJ*^ (F-21) 



52 



Hence, the pdf of the equivalent channel output j/q is equal to 



fy' a (0 



E^_oo/y(C-2zA) A<C<A 







otherwise 



(F-22) 



and the achievable rate of (|F-20|) can be rewritten as 



A 4' (C)iog 2 /,' (C)dC+ I" / (lcff (C)iog 2 /„ cff (C)dC 



— A 



-A 



/„ off (C) io g2 f nm (C) - f y > (C) bg 2 /,/ (C) dC 



(F-23) 



The above principles can now be applied to the channel in ( F-3 ) , where the transmitter pre 



cancells using the GTHP scheme, per each transmitted symbol, the interference due to the corre- 



sponding symbols of previously encoded users. Using (F-ll I and (F-20), the achievable rate of the 



kth user can be obtained by substituting P x — L| fc snr for continuous input, and P XQ = L kk sm 



for the binary setting, yielding, respectively, for real channels 

\ log(12ZLsnr) + £ /„c.(C) log 2 /„c(C) dC 



(F-24) 



and 



A r 



— A 



/"off (0 l0g 2 fn ett (C) ~ fy' (0 l0g 2 fy> (C) 



dC 



(F-25) 



A =V /3L fcfc 5nr 

To complete the analysis, it is left to derive the normalized spectral efficiency of GTHP in the 



large system limit. This is obtained using the following observation (see 49 Lemma 3]). 



Lemma F.l Let H be a K x N random matrix, having i.i.d. circularly symmetric zero-mean 
entries with variance and finite fourth moment, and let H^ k \ k < K, denote the matrix 
constructed by striking out the last K — k rows of H . Then 



L 



kk 



(F-26) 



kk 



and for k, K, N — ^ oo, s.t. |C — > a < oo and & — > v G [0, 1), it follows that 



4— >L 2 (v)±l-va, a 6(0,1] 

k,K,N— >oo 



(F-27) 



Omitting subscripts, the limiting spectral efficiency is thus given for either continuous or binary 
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quantized input by (cf. [49l Eq. (41)]) 

1 K 

C gthp (snr) = lim — Vi?f hp (snr) 



k=l 



K 1 

k'n^oo N K 



K 



Inn J2 R t P ( snr ) 

fe=l 



a [ R gthp (v,snr)dv 
■/o 



(F-28) 

(F-29) 
(F-30) 



where for the continuous case we substitute 



i? gth >,snr) = i?g h >,snr) 4 I log 2 (12L 2 (^)snr) + /„ cff (C) log 2 /„ off (C) d( 



and for the case of discrete binary information bearing signals we substitue 



i? gth >, snr) = i?§ h >, snr) 4 ^ [/„ off (() log 2 /„ c(t (C) - /y Q (C) log 2 /y Q (C)] dC 



A =A /3L 2 (»snr 

(F-31) 



A=- v /3L 2 (v)snr 

(F-32) 



The spectral efficiency for QPSK modulation satisfies (following the convention in [50]): 

O snr ) = 2 O snr / 2 ) , (F-33) 



where C'bpsk( snr ) ^ s gi ven by (F-30 ) and (F-32 ), and it can be expressed as a function of ^ through 



JV 



(5-9 1. An analogous result for the case of continuous input can be readily obtained using (F-31). 



Both spectral efficiencies can be optimized with respect to the choice of the system load a. 
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